Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
5.3k views
in Technique[技术] by (71.8m points)

utf 8 - Character Encoding Issue - Strange Behaviour From Pound Signs (£) with UTF-8 IE6 / ASP / XML

I am having a very strange problem with pound signs displaying incorrectly (or not at all) on a web page.

I am keying text in a textbox, which then gets (briefly) stored in XML before being displayed in a new IE(6) window.

The worst part is that this is inconsistent. I have three different things happening:
1. Pound sign doesn't even appear in source code (assume XML is stripping this out as it appears to use UTF-8 by default).
2. Pound sign appears in source but not on web page.
3. Pound sign appears in source AND FINE on web page (usually, if this happens at all, the first time this is displayed).

Now, this is just one specific part of a bigger problem. I've been looking generally into this and done some research, and it appears that if I have plain ASCII (ISO 8859-1 - Western Europe) and convert to UTF-8, it has no idea what the symbol is and removes it completely (in this case, tho I have seen it replaced by a '?', a box, or an upside down '?' elsewhere).

If you input the pound sign as UTF-8 and convert back to ISO 8859-1, it gains a capital A hat (?) before the pound sign.

I can understand the latter, at least on a basic level - it is because our system must have pound signs saved (or stored in Oracle) with different character encodings throughout it, and, as we don't specify the character encoding (at least generally) for our web pages, sometimes IE gets confused and doesn't display things correctly.

What I don't understand is the inconsistent outcome outlined above.

I realise that I have been a bit vague in my initial explanation, but I hoped that writing out my brief explanation might help myself get my thoughts straight, and possibly help others understand similar problems in the future.

EDIT: Also, I realise I could exchange all the pound signs for the HTML entity (£), but I feel this is time consuming and messy (what if it is stored in Oracle and is later passed to PDF, Excel, etc?).

Obviously any pointers and advice would be appreciated though!

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I am keying text in a textbox, which then gets (briefly) stored in XML before being displayed in a new IE(6) window.

The problem is most likely embedded in this sequence. It would help if you could elaborate the specifics of how this sequence is acheived.

The most common cause for this sort of problem is a mismatch in an understanding between what a client actually encodes a character as and what the server thinks the encoding is. The simplest solution to this is to place the accept-charset attribute on the form element which makes the character encoding of a post explicit.

The text posted in the stuff field will be encoded in utf-8.

The reason for some inconsitencies are:-

  1. It possible that the server can code the characters in the db incorrectly but then when sending those same characters to a browser reverse the corruption, things look fine again on the browser.
  2. ISO-8859-1 means different things in different places. IE6 is somewhat loose with that character set, and will actually treat is as Windows-1252. Other applications place a sctricter interpretaion on ISO-8859-1.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...