Created
July 22, 2013 15:43
-
-
Save jimjkelly/6054893 to your computer and use it in GitHub Desktop.
This gist shows how to properly handle encoding issues in Python 2.x
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # All data coming across the intarwebs is encoded in a file encoding. | |
| # This could be ASCII, UTF-8, UTF-16, Shift-JIS, etc. To properly | |
| # handle data, you need to know the encoding. Thankfully on the web | |
| # the de facto standard seems to be moving towards UTF-8. | |
| # | |
| # In order to safely deal with data - you want to decode this encoded | |
| # data (referred to in Python world as a byte string) from its | |
| # encoding to the generic unicode data type - Python can | |
| # safely work with this in all situations. Let's pretend we | |
| # have some data foo we have just read in from the intarwebs | |
| bar = foo.decode('utf-8') | |
| # bar is no safe to work with - no UnicodeDecodeErrors! When working | |
| # with hard coded text strings, it's always good to write them like | |
| # this so they are unicode and not byte strings: | |
| hello = u'hello' # good! | |
| goodbye = 'goodbye' # bad! | |
| # The other thing you need to know is that when you send data out | |
| # of your program you need to now *encode* it from its unicode | |
| # representation to an encoding. Once again, utf-8 is always | |
| # a fine choice | |
| print bar.encode('utf-8') | |
| with open('output.txt', 'w') as fp: | |
| fp.write(bar.encode('utf-8')) | |
| # And that's basically it - the key is to know that at the edges | |
| # of your program, ie as data is brought in or sent out, you should | |
| # be encoding/decoding, and only working with unicode internally. | |
| # It's a bit clunky, but once you get used to it and act in the | |
| # manner above, it's nice because it's all very deliberate. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment