Hello,
I have tried the internationalizaion / muntilingualization of w3m.
The patch for w3m-0.1.4 is available on the following site.
# The size of patch is over 800Kbytes.
http://www2u.biglobe.ne.jp/~hsaka/w3m/patch/w3m-0.1.4-i18n-5.patch.gz
README.i18n-en (this mail)
README.i18n (for Japanese)
It is a development version. And enough test is not preformed because
I can understand Japanese only. Please use, test, and report bugs.
(My English is poor, please correct README.i18n-en, please.)
W3m applied the patch has following functions.
Coding system(Character set)
The following coding systems (character sets) are supported.
* Japanese
EUC-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212
ISO-2022-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212,
JIS C 6226, etc.
Shift-JIS - US_ASCII, JIS X 0208, JIS X 0201
* Chinese
EUC-CN(CN-GB) - US_ASCII, GB 2312
ISO-2022-CN - US_ASCII, GB 2312, GB 1988, CNS-11643-1,..7, etc.
HZ - US_ASCII, GB 2312
* Chinse (Taiwan)
EUC-TW - US_ASCII, CNS 11643-1,..16
ISO-2022-CN - US_ASCII, CNS-11643-1,..7, GB 2312, GB 1988, etc.
Big5
* Korean
EUC-KR - US_ASCII, KS X 1001 (KS C 5601)
ISO-2022-KR - US_ASCII, KS X 1001 (KS C 5601), etc.
* Vietnamese
TCVN-5712 VN-1, VISCII 1.1, VPS, CP1258
* Thai
TIS-620 (ISO-8859-11), CP874
* Latin etc.
US_ASCII, ISO-8859-1 〜 10, 13 〜 15,
KOI8-R, NeXT,
CP424, CP437, CP737, CP775, CP850, CP852, CP855, CP856, CP857,
CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP1006,
CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257
* Unicode (UCS-2)
UTF-8
NOTE:
* The left part of JIS X 0201 and GB 1988 (Chinese ASCII) are
treated as US_ASCII because they are used in tags of HTML document.
Another variant of US_ASCII is treated without change.
* JIS C 6226(old JIS) is treated as JIS X 0208.
* The right part JIS X 0201 (Katakana) is treated without changing
to JIS X 0208. But it is changed in the case w3m is used with
-K option.
* The sequence '~\n' of HZ is not supported.
* UTF-8 is convered to the character sets base on ISO-2022 at loading.
Code conversion
The following special code conversions are supported.
* EUC-JP ISO-2022-JP <-> Shift-JIS
* EUC-CN <-> ISO-2022-CN <-> HZ
* EUC-TW <-> ISO-2022-CN <-> Big5
* EUC-KR <-> ISO-2022-KR
Aother code conversion is tried using Unicode mapping tables.
Display
If a coding system base on 7bit ISO-2022-* is used as display code,
Some character sets base on ISO-2022 can be displaied in mixture.
If anoher coding system is used, the code conversion using Unicode
is tried
NOTE:
* UTF-8 is not used as display coding system.
Options
-O <coding system>
Set display/output coding system.
-I <coding system>
Set document coding system. A coding system expect ISO-2022-*
can be set.
<coding system>:
j(p): ISO-2022-JP - JIS X 0208 / US_ASCII
j1: ISO-2022-JP - JIS X 0208 / JIS X 0201
j2: ISO-2022-JP - JIS C 6226 / UX_ASCII
j3: ISO-2022-JP - JIS C 6226 / JIS X 0201
cn: ISO-2022-CN - GB 2312 / CNS 11643
kr: ISO-2022-KR - KS X 1001
e(j): EUC-JP
ec: EUC-CN
et: EUC-TW
ek: EUC-KR
s(jis): Shift-JIS
h(z): HZ
b(ig5): Big5
l?: ISO-8859-?
t(is): TIS-620(ISO-8859-11)
tc(vn): TCVN-5712 VN-1
v(iscii): VISCII 1.1
vp(s): VPS
koi: KOI8-R
n(ext): NeXT
cp???: CP???
w12??: CP12??
u(tf8): UTF-8
-L <language>
Set preferred language in internal coding system.
<language>:
ja_JP(j), zh_CN(c), zh_TW(t), zh_TW_Big5(b), ko_KR(k)
-K The Katakana part of JIS X 0201 is changed to JIS X 0208.
-U Code conversions using Unicode is enabled.
-G Use Unicode map of GB 12345 for map of GB 2312.
This option is useful when GB 2312 is converted to JIS X 0208,
CNS 11643 or Big5.
Option panel
Display/output coding system, document coding system,
preferred language, and converting options can be set.
Change the coding system of the document
User can change the coding system of the document after loading.
Put key '=', and select the coding system of the document.
Line Editing
Input coding system is followed by display coding system.
NOTE:
* HZ or UTF-8 can not be used as input coding system.
* Input with ISO-2022-CN or ISO-2022-KR is perhaps failure, because
SI(\017) and SO(\016) are already assigned as other command key.
(SO is assigned as `next-history'). If you want to use SI and SO,
press C-@(^@). After that, SI, SO, SS2, SS3, LS2, and LS3 of
7bit ISO-2022 are recognited. When you press C-@ again, the default
binding is set.
Regular expression (not supported)
I don't have a plan to support multilingual regular expression,
because in almost case it isn't necessaly.
-------------------------------------------
Hironori Sakamoto <hsaka@mth.biglobe.ne.jp>
http://www2u.biglobe.ne.jp/~hsaka/
This archive was generated by hypermail 2b29 : Wed Jul 19 2000 - 10:30:43 CDT