What exactly is a Unicode string?
What's the difference between a regular string and Unicode string?
What is utf-8?
I'm trying to learn Python right now, and I keep hearing this buzzword. What does the code below do?
i18n Strings (Unicode)
> ustring = u'A unicode \u018e string \xf1'
> ustring
u'A unicode \u018e string \xf1'
## (ustring from above contains a unicode string)
> s = ustring.encode('utf-8')
> s
'A unicode \xc6\x8e string \xc3\xb1' ## bytes of utf-8 encoding > t = unicode(s, 'utf-8') ## Convert bytes back to a unicode string
> t == ustring ## It's the same as the original, yay! True
Files Unicode
import codecs
f = codecs.open('foo.txt', 'rU', 'utf-8')
for line in f:
# here line is a *unicode* string