A comment from korean.
Forget ISO-2022-kr. None uses that encoding for web pages
nowaday in korea.
On Mon, Jan 17, 2000 at 11:54:53PM +0900, Hironori Sakamoto wrote:
> Hello,
>
> I have tried the internationalizaion / muntilingualization of w3m.
> The patch for w3m-0.1.4 is available on the following site.
> # The size of patch is over 800Kbytes.
>
> http://www2u.biglobe.ne.jp/~hsaka/w3m/patch/w3m-0.1.4-i18n-5.patch.gz
> README.i18n-en (this mail)
> README.i18n (for Japanese)
>
> It is a development version. And enough test is not preformed because
> I can understand Japanese only. Please use, test, and report bugs.
> (My English is poor, please correct README.i18n-en, please.)
>
> W3m applied the patch has following functions.
>
> Coding system(Character set)
>
> The following coding systems (character sets) are supported.
>
> * Japanese
> EUC-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212
> ISO-2022-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212,
> JIS C 6226, etc.
> Shift-JIS - US_ASCII, JIS X 0208, JIS X 0201
> * Chinese
> EUC-CN(CN-GB) - US_ASCII, GB 2312
> ISO-2022-CN - US_ASCII, GB 2312, GB 1988, CNS-11643-1,..7, etc.
> HZ - US_ASCII, GB 2312
> * Chinse (Taiwan)
> EUC-TW - US_ASCII, CNS 11643-1,..16
> ISO-2022-CN - US_ASCII, CNS-11643-1,..7, GB 2312, GB 1988, etc.
> Big5
> * Korean
> EUC-KR - US_ASCII, KS X 1001 (KS C 5601)
> ISO-2022-KR - US_ASCII, KS X 1001 (KS C 5601), etc.
> * Vietnamese
> TCVN-5712 VN-1, VISCII 1.1, VPS, CP1258
> * Thai
> TIS-620 (ISO-8859-11), CP874
> * Latin etc.
> US_ASCII, ISO-8859-1 〜 10, 13 〜 15,
> KOI8-R, NeXT,
> CP424, CP437, CP737, CP775, CP850, CP852, CP855, CP856, CP857,
> CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP1006,
> CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257
> * Unicode (UCS-2)
> UTF-8
>
> NOTE:
> * The left part of JIS X 0201 and GB 1988 (Chinese ASCII) are
> treated as US_ASCII because they are used in tags of HTML document.
> Another variant of US_ASCII is treated without change.
> * JIS C 6226(old JIS) is treated as JIS X 0208.
> * The right part JIS X 0201 (Katakana) is treated without changing
> to JIS X 0208. But it is changed in the case w3m is used with
> -K option.
> * The sequence '~\n' of HZ is not supported.
> * UTF-8 is convered to the character sets base on ISO-2022 at loading.
>
> Code conversion
>
> The following special code conversions are supported.
> * EUC-JP ISO-2022-JP <-> Shift-JIS
> * EUC-CN <-> ISO-2022-CN <-> HZ
> * EUC-TW <-> ISO-2022-CN <-> Big5
> * EUC-KR <-> ISO-2022-KR
>
> Aother code conversion is tried using Unicode mapping tables.
>
> Display
>
> If a coding system base on 7bit ISO-2022-* is used as display code,
> Some character sets base on ISO-2022 can be displaied in mixture.
> If anoher coding system is used, the code conversion using Unicode
> is tried
>
> NOTE:
> * UTF-8 is not used as display coding system.
>
> Options
>
> -O <coding system>
> Set display/output coding system.
> -I <coding system>
> Set document coding system. A coding system expect ISO-2022-*
> can be set.
> <coding system>:
> j(p): ISO-2022-JP - JIS X 0208 / US_ASCII
> j1: ISO-2022-JP - JIS X 0208 / JIS X 0201
> j2: ISO-2022-JP - JIS C 6226 / UX_ASCII
> j3: ISO-2022-JP - JIS C 6226 / JIS X 0201
> cn: ISO-2022-CN - GB 2312 / CNS 11643
> kr: ISO-2022-KR - KS X 1001
> e(j): EUC-JP
> ec: EUC-CN
> et: EUC-TW
> ek: EUC-KR
> s(jis): Shift-JIS
> h(z): HZ
> b(ig5): Big5
> l?: ISO-8859-?
> t(is): TIS-620(ISO-8859-11)
> tc(vn): TCVN-5712 VN-1
> v(iscii): VISCII 1.1
> vp(s): VPS
> koi: KOI8-R
> n(ext): NeXT
> cp???: CP???
> w12??: CP12??
> u(tf8): UTF-8
> -L <language>
> Set preferred language in internal coding system.
> <language>:
> ja_JP(j), zh_CN(c), zh_TW(t), zh_TW_Big5(b), ko_KR(k)
> -K The Katakana part of JIS X 0201 is changed to JIS X 0208.
> -U Code conversions using Unicode is enabled.
> -G Use Unicode map of GB 12345 for map of GB 2312.
> This option is useful when GB 2312 is converted to JIS X 0208,
> CNS 11643 or Big5.
>
> Option panel
>
> Display/output coding system, document coding system,
> preferred language, and converting options can be set.
>
> Change the coding system of the document
>
> User can change the coding system of the document after loading.
> Put key '=', and select the coding system of the document.
>
> Line Editing
>
> Input coding system is followed by display coding system.
>
> NOTE:
> * HZ or UTF-8 can not be used as input coding system.
> * Input with ISO-2022-CN or ISO-2022-KR is perhaps failure, because
> SI(\017) and SO(\016) are already assigned as other command key.
> (SO is assigned as `next-history'). If you want to use SI and SO,
> press C-@(^@). After that, SI, SO, SS2, SS3, LS2, and LS3 of
> 7bit ISO-2022 are recognited. When you press C-@ again, the default
> binding is set.
>
> Regular expression (not supported)
>
> I don't have a plan to support multilingual regular expression,
> because in almost case it isn't necessaly.
> -------------------------------------------
> Hironori Sakamoto <hsaka@mth.biglobe.ne.jp>
> http://www2u.biglobe.ne.jp/~hsaka/
This archive was generated by hypermail 2b29 : Wed Jul 19 2000 - 10:30:43 CDT