Hi,
sorry, I haven't checked the list for a week.
In article <200011300733.QAA03684@udlew10.uldev.lsi.nec.co.jp> you write:
[...]
>I saw your page.
>Is Unicode Consortium's Big5 table broken !? Is CP950 table broken too ?
>I can see some pages encoded in Big5 to display with EUC-JP
>except some characters.
[...]
The problem with the Unicode Consortium's Big5 table is that
Big5 is not a real standard, and various incompatible versions
(more than three or four) are in use. I found that the following
small experiment (which I found a few weeks ago) still works:
- go to http://www.google.com/intl/zh-TW/ (Big5 charset)
- search "iso 15445"
- go to page 4; you should see a number of Japanese pages (e.g.,
the first page is www.ne.jp/asahi/minazuki/bakera/html/updatelog)
even though the charset is still Big5
- look at the summaries of the Japanese pages and see if they make
any sense. If your Big5 is "standard" (explained below) these
summaries should NOT make sense.
Around the time the Unicode Consortium released their Big5 mapping
table, the TWMOE (Taiwan Ministry of Education) released a set of
reference fonts which are now used by XFree86; because TWMOE is a
government agency and Big5 originates from Taiwan, I consider the
TWMOE fonts to represent a standard Big5 encoding.
Under Unix running X with the TWMOE fonts, or under MacOS 8 or 9,
the summaries of the Japanese pages show gibberish where kana should
appear; it seems that Google uses the Unicode Consortium's table.
(MacOS X is BSD but uses Unicode, so I don't know which Big5 it uses.
I don't have a spare Mac for testing.)
My first observation (highly subjective, but still an observation) is
that I have never seen any actual web page containing kana that uses
the Big5 encoding as described by the Unicode Consortium. All real
pages seem to use the encoding used by MacOS 8/9 or Unix.
My second observation is that the Big5 encoding as described by the
Unicode Consortium has missing Cyrillic letters in the part containing
the Cyrillic alphabet; this seems illogical for even an ad-hoc
"industry standard" like Big5. Assuming that the TWMOE knows what they
are doing (and I do assume a government agency for education to know
what they are doing), I would tend to believe that the Unicode
Consortium got their Big5 encoding wrong. (I emailed them my report,
but don't know what, if any, would happen.)
I have also seen some "Big5" pages that seem to actually use HKSGS
(http://www.info.gov.hk/gccs/), which doesn't follow Big5 encoding
rules and of course isn't even described by the Unicode Consortium's
mapping table. IMHO the situation with Big5 is a completely pathetic
mess.
>-----------------------------------
>Hironori Sakamoto <hsaka@mth.biglobe.ne.jp>
> http://www2u.biglobe.ne.jp/~hsaka/
Ambrose
-- Ambrose Li <acli@ada.dhs.org> http://trends.ca/~acli/ http://trends.ca/~acli/ambrose/not-work/opinions/vote2000.html "A good style should show no sign of effort; what is written should seem a happy accident." -- Somerset Maugham.
This archive was generated by hypermail 2b29 : Fri Dec 08 2000 - 23:17:29 CST