Skip to content

Instantly share code, notes, and snippets.

@jimjkelly
Created July 22, 2013 15:43
Show Gist options
  • Select an option

  • Save jimjkelly/6054893 to your computer and use it in GitHub Desktop.

Select an option

Save jimjkelly/6054893 to your computer and use it in GitHub Desktop.

Revisions

  1. jimjkelly created this gist Jul 22, 2013.
    35 changes: 35 additions & 0 deletions python-encoding.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,35 @@
    # All data coming across the intarwebs is encoded in a file encoding.
    # This could be ASCII, UTF-8, UTF-16, Shift-JIS, etc. To properly
    # handle data, you need to know the encoding. Thankfully on the web
    # the de facto standard seems to be moving towards UTF-8.
    #
    # In order to safely deal with data - you want to decode this encoded
    # data (referred to in Python world as a byte string) from its
    # encoding to the generic unicode data type - Python can
    # safely work with this in all situations. Let's pretend we
    # have some data foo we have just read in from the intarwebs

    bar = foo.decode('utf-8')

    # bar is no safe to work with - no UnicodeDecodeErrors! When working
    # with hard coded text strings, it's always good to write them like
    # this so they are unicode and not byte strings:

    hello = u'hello' # good!
    goodbye = 'goodbye' # bad!

    # The other thing you need to know is that when you send data out
    # of your program you need to now *encode* it from its unicode
    # representation to an encoding. Once again, utf-8 is always
    # a fine choice

    print bar.encode('utf-8')
    with open('output.txt', 'w') as fp:
    fp.write(bar.encode('utf-8'))

    # And that's basically it - the key is to know that at the edges
    # of your program, ie as data is brought in or sent out, you should
    # be encoding/decoding, and only working with unicode internally.

    # It's a bit clunky, but once you get used to it and act in the
    # manner above, it's nice because it's all very deliberate.