[w3m-dev-en 00003] w3m-i18n/m17n

From: Hironori Sakamoto (hsaka@mth.biglobe.ne.jp)
Date: Mon Jan 17 2000 - 08:54:53 CST


Hello,

  I have tried the internationalizaion / muntilingualization of w3m.
  The patch for w3m-0.1.4 is available on the following site.
  # The size of patch is over 800Kbytes.

    http://www2u.biglobe.ne.jp/~hsaka/w3m/patch/w3m-0.1.4-i18n-5.patch.gz
                                                README.i18n-en (this mail)
                                                README.i18n (for Japanese)

  It is a development version. And enough test is not preformed because
  I can understand Japanese only. Please use, test, and report bugs.
  (My English is poor, please correct README.i18n-en, please.)

  W3m applied the patch has following functions.
  
Coding system(Character set)

  The following coding systems (character sets) are supported.

  * Japanese
      EUC-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212
      ISO-2022-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212,
                      JIS C 6226, etc.
      Shift-JIS - US_ASCII, JIS X 0208, JIS X 0201
  * Chinese
      EUC-CN(CN-GB) - US_ASCII, GB 2312
      ISO-2022-CN - US_ASCII, GB 2312, GB 1988, CNS-11643-1,..7, etc.
      HZ - US_ASCII, GB 2312
  * Chinse (Taiwan)
      EUC-TW - US_ASCII, CNS 11643-1,..16
      ISO-2022-CN - US_ASCII, CNS-11643-1,..7, GB 2312, GB 1988, etc.
      Big5
  * Korean
      EUC-KR - US_ASCII, KS X 1001 (KS C 5601)
      ISO-2022-KR - US_ASCII, KS X 1001 (KS C 5601), etc.
  * Vietnamese
      TCVN-5712 VN-1, VISCII 1.1, VPS, CP1258
  * Thai
      TIS-620 (ISO-8859-11), CP874
  * Latin etc.
      US_ASCII, ISO-8859-1 〜 10, 13 〜 15,
      KOI8-R, NeXT,
      CP424, CP437, CP737, CP775, CP850, CP852, CP855, CP856, CP857,
      CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP1006,
      CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257
  * Unicode (UCS-2)
      UTF-8

  NOTE:
    * The left part of JIS X 0201 and GB 1988 (Chinese ASCII) are
      treated as US_ASCII because they are used in tags of HTML document.
      Another variant of US_ASCII is treated without change.
    * JIS C 6226(old JIS) is treated as JIS X 0208.
    * The right part JIS X 0201 (Katakana) is treated without changing
      to JIS X 0208. But it is changed in the case w3m is used with
      -K option.
    * The sequence '~\n' of HZ is not supported.
    * UTF-8 is convered to the character sets base on ISO-2022 at loading.

Code conversion

  The following special code conversions are supported.
    * EUC-JP ISO-2022-JP <-> Shift-JIS
    * EUC-CN <-> ISO-2022-CN <-> HZ
    * EUC-TW <-> ISO-2022-CN <-> Big5
    * EUC-KR <-> ISO-2022-KR

  Aother code conversion is tried using Unicode mapping tables.

Display

  If a coding system base on 7bit ISO-2022-* is used as display code,
  Some character sets base on ISO-2022 can be displaied in mixture.
  If anoher coding system is used, the code conversion using Unicode
  is tried
  
  NOTE:
    * UTF-8 is not used as display coding system.
   
Options
  
  -O <coding system>
      Set display/output coding system.
  -I <coding system>
      Set document coding system. A coding system expect ISO-2022-*
      can be set.
      <coding system>:
         j(p): ISO-2022-JP - JIS X 0208 / US_ASCII
         j1: ISO-2022-JP - JIS X 0208 / JIS X 0201
         j2: ISO-2022-JP - JIS C 6226 / UX_ASCII
         j3: ISO-2022-JP - JIS C 6226 / JIS X 0201
         cn: ISO-2022-CN - GB 2312 / CNS 11643
         kr: ISO-2022-KR - KS X 1001
         e(j): EUC-JP
         ec: EUC-CN
         et: EUC-TW
         ek: EUC-KR
         s(jis): Shift-JIS
         h(z): HZ
         b(ig5): Big5
         l?: ISO-8859-?
         t(is): TIS-620(ISO-8859-11)
         tc(vn): TCVN-5712 VN-1
         v(iscii): VISCII 1.1
         vp(s): VPS
         koi: KOI8-R
         n(ext): NeXT
         cp???: CP???
         w12??: CP12??
         u(tf8): UTF-8
  -L <language>
      Set preferred language in internal coding system.
      <language>:
         ja_JP(j), zh_CN(c), zh_TW(t), zh_TW_Big5(b), ko_KR(k)
  -K The Katakana part of JIS X 0201 is changed to JIS X 0208.
  -U Code conversions using Unicode is enabled.
  -G Use Unicode map of GB 12345 for map of GB 2312.
      This option is useful when GB 2312 is converted to JIS X 0208,
      CNS 11643 or Big5.

Option panel

  Display/output coding system, document coding system,
  preferred language, and converting options can be set.

Change the coding system of the document

  User can change the coding system of the document after loading.
  Put key '=', and select the coding system of the document.

Line Editing

  Input coding system is followed by display coding system.

  NOTE:
    * HZ or UTF-8 can not be used as input coding system.
    * Input with ISO-2022-CN or ISO-2022-KR is perhaps failure, because
      SI(\017) and SO(\016) are already assigned as other command key.
      (SO is assigned as `next-history'). If you want to use SI and SO,
      press C-@(^@). After that, SI, SO, SS2, SS3, LS2, and LS3 of
      7bit ISO-2022 are recognited. When you press C-@ again, the default
      binding is set.

Regular expression (not supported)

   I don't have a plan to support multilingual regular expression,
   because in almost case it isn't necessaly.
-------------------------------------------
Hironori Sakamoto <hsaka@mth.biglobe.ne.jp>
 http://www2u.biglobe.ne.jp/~hsaka/



This archive was generated by hypermail 2b29 : Wed Jul 19 2000 - 10:30:43 CDT