Inconsistent Character Encoding Problems

I’ve been wrecking my head trying to get some web-pages I authored to display tildes (“´”) and “ñ”s. It seems that every mention I googled suggests changing the following line,

  <META http-equiv=Content-Type content="text/html; charset=UTF-8">

<http://www.cl.cam.ac.uk/~mgk25/unicode.html#web>
<http://www.interaktonline.com/Products/Dreamweaver-Extensions/MXRSSReader-Writer/Product-Forum/Details/121651/RSS+Reader+spanish+char+encoding+problem.html>

where UTF-8 supports Spanish characters. The problem is that it didn’t for me on OS X (Tiger) and Firefox 2 or SeaMonkey 1.1.1, although the same document renders with IE on Windows (XP?). On the other hand, ISO-Latin-1 (ISO-8859-1) and ISO-8859-3 (southern European) works for my OS X setup but does not render well with IE on Windows.

I tried all sorts of things, including setting my editor (Smultron) and ftp client (Cyberduck) to save files with working character encodings. As I said, success was mixed because different browsers seem to respond differently (and seemingly inconsistent to different encodings). For instance, SeaMonkey 1.1.1 and Firefox 2 defaults get overridden, while Safari doesn’t.

The solution came with the following,
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”&gt;
<html xmlns=”http://www.w3.org/1999/xhtml&#8221; xml:lang=”en” lang=”en”><head><title>Info</title>
<meta http-equiv=”Content-Type” content=”text/html; charset=ISO-utf-8″>
<meta name=”MSSmartTagsPreventParsing” content=”true” />

where ‘xml:lang=”en” lang=”en”‘ should be changed to ‘xml:lang=”es” lang=”es”‘ as well as setting the appropriate character encoding from “content=”text/html; charset=ISO-UTF-8” to “content=”text/html; charset=ISO-8859-1” or “text/html; charset=ISO-8859-3”. Keep in mind that that “text/html” should be “text/plain” for plain text.

At some point I realized that just changing the “charset” onwards line is not enough, as most googled inquiries are found of suggesting, but ‘xml:lang=”en” lang=”en”‘ have to be changed for a consistent and reproducible rendering change to occur (it pays to read the source).

After fixing this issue, I inserted the entire above code into a simple .html page I created (the above code is from a .xhtml template) and it also seemingly solved the issue (see “UPDATE”s below).

The last line is not necessary as this relates to preventing m$ from high-jacking your web-page for advertising (the info is a little dated). For more on this read the following.
<http://www.therapist-uk.net/Net/Isp/IspHelp/MetaTags2.htm>

I can’t believe the headaches associated with this issue because of the various points to consider. And I’m not going to start listing UTF-8 in my pages any time soon as its implementation seems flaky cross platform, at best. So to sum up, if you’re writing a European language (and it’s probably safe to assume that most of us on the net are) UTF-8 bad, ISO-8859-1 and ISO-8859-3 good, but don’t leave out the “lang” settings either.

UPDATE:
After checking with a friend that runs IE and Windows, she verified that there are still display problems. Well, I thought about it an d remembered vaguely browsing a page on UTF-8 that mentions that changing the header info is not enough, this will cause problems (verified), but that you need to convert the source to UTF-8. SeaMonkey’s Composer has a converter function under “File”. It even changes your header for you (but leaves the “lang” settings). This hasn’t created more character problems than just changing the header info caused when specifying UTF-8. I’ll cross my fingers till I can get a hold of a Windows/IE computer.

UPDATE: Confirmed! The conversion to UTF-8 with the above ‘xml:lang=”en” lang=”en”‘ and “content=”text/html; charset=ISO-UTF-8” considerations work.

I’ll write it again, “I can’t believe the headaches associated with this issue because of the various points to consider”! Even so, UTF-8 displays well in various browsers and platforms; I guess I’m an UTF-8 believer.

Maurice Cepeda

This is licensed under the Attribution-NonCommercial-ShareAlike 3.0 Unported Creative Commons License. All brands mentioned are properties of their respective owners. By reading this article, the reader forgoes any accountability of the writer. The reading of this article implies acceptance of the above stipulations. The author requires attribution –by full name and URL– and notification of republications.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s