/[ddp]/manuals/trunk/intro-i18n/intro-i18n.sgml
ViewVC logotype

Diff of /manuals/trunk/intro-i18n/intro-i18n.sgml

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1052 by kubota, Thu Nov 23 06:15:51 2000 UTC revision 1057 by kubota, Tue Nov 28 05:48:43 2000 UTC
# Line 114  These chapters are consist of contributi Line 114  These chapters are consist of contributi
114  <P>  <P>
115  Otherwise, this will be a mere document only on Japanization,  Otherwise, this will be a mere document only on Japanization,
116  because the original author Tomohiro KUBOTA  because the original author Tomohiro KUBOTA
117  (<email>kubota@debian.or.jp</email>)  (<email>kubota@debian.org</email>)
118  speaks Japanese and live in Japan.  speaks Japanese and live in Japan.
119  </P>  </P>
120    
# Line 814  processing of existing character codes, Line 814  processing of existing character codes,
814  </P>  </P>
815    
816  <P>  <P>
817    These varieties of character sets and encodings will tell you about
818    struggles of people in the world to handle their own languages by
819    computers.  Especially, CJK people could not help working out various
820    technologies to use plenty of characters within ASCII-based computer
821    systems.
822    </P>
823    
824    <P>
825  If you are planning to develop a text-processing software  If you are planning to develop a text-processing software
826  beyond the fields which the LOCALE technology covers, you will  beyond the fields which the LOCALE technology covers, you will
827  have to understand the following descriptions very well.  have to understand the following descriptions very well.
# Line 1083  where 'F' is determined for each charact Line 1091  where 'F' is determined for each charact
1091       <item>and more       <item>and more
1092      </list>      </list>
1093  </list>  </list>
1094  A more detailed list of these character set is found at  The complete list of these coded character set is found at
1095  <url id="http://www.kudpc.kyoto-u.ac.jp/~yasuoka/CJK.html">.  <url id="http://www.itscj.ipsj.or.jp/ISO-IR/"
1096  <footnote>   name="International Register of Coded Character Sets">.
 WHERE CAN I FIND THE COMPLETE AND AUTHORITATIVE TABLE OF THIS?  
 </footnote>  
1097  </P>  </P>
1098    
1099  <P>  <P>
# Line 1596  Unicode font vendors will hesitate to ch Line 1602  Unicode font vendors will hesitate to ch
1602  simplified Chinese character, traditional Chinese one, Japanese one, or  simplified Chinese character, traditional Chinese one, Japanese one, or
1603  Korean one.  One method is to supply four fonts of simplified Chinese  Korean one.  One method is to supply four fonts of simplified Chinese
1604  version, traditional Chinese version, Japanese version, and Korean version.  version, traditional Chinese version, Japanese version, and Korean version.
1605  Commercial OS vender can release localized version of their OS ---  Commercial OS vendor can release localized version of their OS ---
1606  for example, Japanese version of MS Windows can include Japanese version  for example, Japanese version of MS Windows can include Japanese version
1607  of Unicode font (this is what they are exactly doing).  However, how  of Unicode font (this is what they are exactly doing).  However, how
1608  should XFree86 or Debian do?  I don't know...  should XFree86 or Debian do?  I don't know...
# Line 1662  are expressed with the same width of bit Line 1668  are expressed with the same width of bit
1668  programming more difficult.  programming more difficult.
1669  </P>  </P>
1670    
1671    <P>
1672    Fortunately, Debian and other UNIX-like systems will use UTF-8
1673    (not UTF-16) as a usual encoding for UCS.  Thus, we don't need
1674    to handle UTF-16 and surrogate pair very often.
1675    </P>
1676    
1677  <sect2 id="646problem"><heading>ISO 646-* Problem</heading>  <sect2 id="646problem"><heading>ISO 646-* Problem</heading>
1678    
# Line 1707  Thus all local codesets should not use c Line 1718  Thus all local codesets should not use c
1718  to ASCII, such as ISO 646-*.  to ASCII, such as ISO 646-*.
1719  </P>  </P>
1720    
1721    <P>
1722    <url id="http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html"
1723    name="Problems and Solutions for Unicode and User/Vendor Defined
1724    Characters"> discusses on this problem.
1725    </P>
1726    
1727  <sect id="othercodes"><heading>Other Character Sets and Encodings</heading>  <sect id="othercodes"><heading>Other Character Sets and Encodings</heading>
1728    
# Line 3954  don't have to aware of it.  This is beca Line 3970  don't have to aware of it.  This is beca
3970  encodings.  encodings.
3971  </P>  </P>
3972    
3973    <sect id="java"><heading>Java</heading>
3974    
3975    <p>
3976    Full internationalization is naturally lead from
3977    Java's "Write Once, Run Anywhere" principle.
3978    To achieve this, Java uses Unicode as internal code
3979    for <tt>char</tt> and <tt>String</tt>.  It is important
3980    that Unicode is <em>internal</em> code.  Java obeys
3981    the current LOCALE and encoding is automatically
3982    converted for I/O.  Thus, <em>users</em> of applications written
3983    in Java doesn't need to be aware of Unicode.
3984    </p>
3985    
3986    <p>
3987    Then how about <em>developers</em>?  They also don't need
3988    to be aware of the internal encoding.  Character processings
3989    such as counting of number of characers in a string work well.
3990    And more, you don't have to worry about display/input.
3991    </p>
3992    
3993    <p>
3994    However, you may want to handle specified encodings for,
3995    for example, MIME encoding/decoding.  For such purposes,
3996    I/O can be done by specifying external encoding.
3997    Check <tt>InputStreamReader</tt> and <tt>OutputStreamReader</tt>
3998    classes.  You can also convert between the internal encoding
3999    and specified encodings by
4000    <tt>String.getBytes(</tt><em>encoding</em><tt>)</tt> and
4001    <tt>String(byte []</tt> <em>bytes</em><tt>, </tt><em>encoding</em><tt>)</tt>.
4002    </p>
4003    
4004    
4005    
4006    
4007  <sect id="shellscript"><heading>Shell Script</heading>  <sect id="shellscript"><heading>Shell Script</heading>
4008    
4009  <P>***** Not written yet *****</P>  <P>***** Not written yet *****</P>
# Line 4071  Characters (general) Line 4121  Characters (general)
4121     Note that both coded character sets (for example, KS_C_5601-1987,     Note that both coded character sets (for example, KS_C_5601-1987,
4122     MIBenum 36) and encodings (for example, ISO-2022-KR, MIBenum: 37)     MIBenum 36) and encodings (for example, ISO-2022-KR, MIBenum: 37)
4123     are registered.  How confusing!     are registered.  How confusing!
4124     <item>
4125       <url id="http://www.itscj.ipsj.or.jp/ISO-IR/"
4126       name="International Register of Coded Character Sets">
4127       A complete list of registered CCS, with ISO 2022 escape sequences.
4128       PDF files for these CCS are also available.
4129  </list>  </list>
4130  Characters (ISO 8859)  Characters (ISO 8859)
4131  <list>  <list>
4132   <item>   <item>
4133     <url id="http://czyborra.com/charsets/iso8859.html">     <url id="http://czyborra.com/charsets/iso8859.html"
4134       name="ISO 8859 Alphabet Soup">
4135   <item>   <item>
4136     <url id="http://park.kiev.ua/multiling/ml-docs/iso-8859.html">     <url id="http://park.kiev.ua/multiling/ml-docs/iso-8859.html"
4137   <item>     name="ISO 8859 Character Sets">
    <url id="http://www.terena.nl/projects/multiling/ml-docs/iso-8859.html">  
4138  </list>  </list>
4139  Characters (ISO 2022)  Characters (ISO 2022)
4140  <list>  <list>
# Line 4091  Characters (ISO 2022) Line 4146  Characters (ISO 2022)
4146  Characters (ISO 10646 and Unicode)  Characters (ISO 10646 and Unicode)
4147  <list>  <list>
4148   <item><url id="http://www.unicode.org/" name="Unicode Consortium">   <item><url id="http://www.unicode.org/" name="Unicode Consortium">
4149     <item>
4150       <url id="http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html"
4151       name="Problems and Solutions for Unicode and User/Vendor Defined
4152       Characters">
4153  </list>  </list>
4154  </P>  </P>
4155    
# Line 4145  Softwares Line 4204  Softwares
4204     <url id="http://clisp.cons.org/~haible/packages-libutf8.html"     <url id="http://clisp.cons.org/~haible/packages-libutf8.html"
4205     name="libutf8 - a Unicode/UTF-8 locale plugin"> provides     name="libutf8 - a Unicode/UTF-8 locale plugin"> provides
4206     UTF-8 locale support for systems which don't have UTF-8 locales.     UTF-8 locale support for systems which don't have UTF-8 locales.
4207     <item>
4208       <url id="http://www.pango.org/" name="Pango"> is a project to
4209       develop a portable high-quality text rendering engine.
4210  </list>  </list>
4211  </P>  </P>
4212    

Legend:
Removed from v.1052  
changed lines
  Added in v.1057

  ViewVC Help
Powered by ViewVC 1.1.5