# All data coming across the intarwebs is encoded in a file encoding.
# This could be ASCII, UTF-8, UTF-16, Shift-JIS, etc.  To properly
# handle data, you need to know the encoding.  Thankfully on the web
# the de facto standard seems to be moving towards UTF-8.
# 
# In order to safely deal with data - you want to decode this encoded
# data (referred to in Python world as a byte string) from its 
# encoding to the generic unicode data type - Python can
# safely work with this in all situations. Let's pretend we
# have some data foo we have just read in from the intarwebs

bar = foo.decode('utf-8')

# bar is no safe to work with - no UnicodeDecodeErrors! When working
# with hard coded text strings, it's always good to write them like
# this so they are unicode and not byte strings:

hello = u'hello' # good!
goodbye = 'goodbye' # bad!

# The other thing you need to know is that when you send data out
# of your program you need to now *encode* it from its unicode
# representation to an encoding. Once again, utf-8 is always
# a fine choice

print bar.encode('utf-8')
with open('output.txt', 'w') as fp:
    fp.write(bar.encode('utf-8'))

# And that's basically it - the key is to know that at the edges
# of your program, ie as data is brought in or sent out, you should
# be encoding/decoding, and only working with unicode internally.

# It's a bit clunky, but once you get used to it and act in the
# manner above, it's nice because it's all very deliberate.