It's a misleading error-report which comes from the way python handles the de/encoding process. You tried to decode an already decoded String a second time and that confuses the Python function which retaliates by confusing you in turn! ;-) The encoding/decoding process takes place as far as i know, by the codecs-module. And somewhere there lies the origin for this misleading Exception messages.
You may check for yourself: either
u'x80'.encode('ascii')
or
u'x80'.decode('ascii')
will throw a UnicodeEncodeError, where a
u'x80'.encode('utf8')
will not, but
u'x80'.decode('utf8')
again will!
I guess you are confused by the meaning of encoding and decoding.
To put it simple:
decode encode
ByteString (ascii) --------> UNICODE ---------> ByteString (utf8)
codec codec
But why is there a codec
-argument for the decode
method? Well, the underlying function can not guess which codec the ByteString was encoded with, so as a hint it takes codec
as an argument. If not provided it assumes you mean the sys.getdefaultencoding()
to be implicitly used.
so when you use c.decode('ascii')
you a) have a (encoded) ByteString (thats why you use decode) b) you want to get a unicode-representation-object (thats what you use decode for) and c) the codec in which the ByteString is encoded is ascii.
See also:
https://stackoverflow.com/a/370199/1107807
http://docs.python.org/howto/unicode.html
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://www.stereoplex.com/blog/python-unicode-and-unicodedecodeerror
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…