| 114 |
<P> |
<P> |
| 115 |
Otherwise, this will be a mere document only on Japanization, |
Otherwise, this will be a mere document only on Japanization, |
| 116 |
because the original author Tomohiro KUBOTA |
because the original author Tomohiro KUBOTA |
| 117 |
(<email>kubota@debian.or.jp</email>) |
(<email>kubota@debian.org</email>) |
| 118 |
speaks Japanese and live in Japan. |
speaks Japanese and live in Japan. |
| 119 |
</P> |
</P> |
| 120 |
|
|
| 814 |
</P> |
</P> |
| 815 |
|
|
| 816 |
<P> |
<P> |
| 817 |
|
These varieties of character sets and encodings will tell you about |
| 818 |
|
struggles of people in the world to handle their own languages by |
| 819 |
|
computers. Especially, CJK people could not help working out various |
| 820 |
|
technologies to use plenty of characters within ASCII-based computer |
| 821 |
|
systems. |
| 822 |
|
</P> |
| 823 |
|
|
| 824 |
|
<P> |
| 825 |
If you are planning to develop a text-processing software |
If you are planning to develop a text-processing software |
| 826 |
beyond the fields which the LOCALE technology covers, you will |
beyond the fields which the LOCALE technology covers, you will |
| 827 |
have to understand the following descriptions very well. |
have to understand the following descriptions very well. |
| 1091 |
<item>and more |
<item>and more |
| 1092 |
</list> |
</list> |
| 1093 |
</list> |
</list> |
| 1094 |
A more detailed list of these character set is found at |
The complete list of these coded character set is found at |
| 1095 |
<url id="http://www.kudpc.kyoto-u.ac.jp/~yasuoka/CJK.html">. |
<url id="http://www.itscj.ipsj.or.jp/ISO-IR/" |
| 1096 |
<footnote> |
name="International Register of Coded Character Sets">. |
|
WHERE CAN I FIND THE COMPLETE AND AUTHORITATIVE TABLE OF THIS? |
|
|
</footnote> |
|
| 1097 |
</P> |
</P> |
| 1098 |
|
|
| 1099 |
<P> |
<P> |
| 1602 |
simplified Chinese character, traditional Chinese one, Japanese one, or |
simplified Chinese character, traditional Chinese one, Japanese one, or |
| 1603 |
Korean one. One method is to supply four fonts of simplified Chinese |
Korean one. One method is to supply four fonts of simplified Chinese |
| 1604 |
version, traditional Chinese version, Japanese version, and Korean version. |
version, traditional Chinese version, Japanese version, and Korean version. |
| 1605 |
Commercial OS vender can release localized version of their OS --- |
Commercial OS vendor can release localized version of their OS --- |
| 1606 |
for example, Japanese version of MS Windows can include Japanese version |
for example, Japanese version of MS Windows can include Japanese version |
| 1607 |
of Unicode font (this is what they are exactly doing). However, how |
of Unicode font (this is what they are exactly doing). However, how |
| 1608 |
should XFree86 or Debian do? I don't know... |
should XFree86 or Debian do? I don't know... |
| 1668 |
programming more difficult. |
programming more difficult. |
| 1669 |
</P> |
</P> |
| 1670 |
|
|
| 1671 |
|
<P> |
| 1672 |
|
Fortunately, Debian and other UNIX-like systems will use UTF-8 |
| 1673 |
|
(not UTF-16) as a usual encoding for UCS. Thus, we don't need |
| 1674 |
|
to handle UTF-16 and surrogate pair very often. |
| 1675 |
|
</P> |
| 1676 |
|
|
| 1677 |
<sect2 id="646problem"><heading>ISO 646-* Problem</heading> |
<sect2 id="646problem"><heading>ISO 646-* Problem</heading> |
| 1678 |
|
|
| 1718 |
to ASCII, such as ISO 646-*. |
to ASCII, such as ISO 646-*. |
| 1719 |
</P> |
</P> |
| 1720 |
|
|
| 1721 |
|
<P> |
| 1722 |
|
<url id="http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html" |
| 1723 |
|
name="Problems and Solutions for Unicode and User/Vendor Defined |
| 1724 |
|
Characters"> discusses on this problem. |
| 1725 |
|
</P> |
| 1726 |
|
|
| 1727 |
<sect id="othercodes"><heading>Other Character Sets and Encodings</heading> |
<sect id="othercodes"><heading>Other Character Sets and Encodings</heading> |
| 1728 |
|
|
| 3970 |
encodings. |
encodings. |
| 3971 |
</P> |
</P> |
| 3972 |
|
|
| 3973 |
|
<sect id="java"><heading>Java</heading> |
| 3974 |
|
|
| 3975 |
|
<p> |
| 3976 |
|
Full internationalization is naturally lead from |
| 3977 |
|
Java's "Write Once, Run Anywhere" principle. |
| 3978 |
|
To achieve this, Java uses Unicode as internal code |
| 3979 |
|
for <tt>char</tt> and <tt>String</tt>. It is important |
| 3980 |
|
that Unicode is <em>internal</em> code. Java obeys |
| 3981 |
|
the current LOCALE and encoding is automatically |
| 3982 |
|
converted for I/O. Thus, <em>users</em> of applications written |
| 3983 |
|
in Java doesn't need to be aware of Unicode. |
| 3984 |
|
</p> |
| 3985 |
|
|
| 3986 |
|
<p> |
| 3987 |
|
Then how about <em>developers</em>? They also don't need |
| 3988 |
|
to be aware of the internal encoding. Character processings |
| 3989 |
|
such as counting of number of characers in a string work well. |
| 3990 |
|
And more, you don't have to worry about display/input. |
| 3991 |
|
</p> |
| 3992 |
|
|
| 3993 |
|
<p> |
| 3994 |
|
However, you may want to handle specified encodings for, |
| 3995 |
|
for example, MIME encoding/decoding. For such purposes, |
| 3996 |
|
I/O can be done by specifying external encoding. |
| 3997 |
|
Check <tt>InputStreamReader</tt> and <tt>OutputStreamReader</tt> |
| 3998 |
|
classes. You can also convert between the internal encoding |
| 3999 |
|
and specified encodings by |
| 4000 |
|
<tt>String.getBytes(</tt><em>encoding</em><tt>)</tt> and |
| 4001 |
|
<tt>String(byte []</tt> <em>bytes</em><tt>, </tt><em>encoding</em><tt>)</tt>. |
| 4002 |
|
</p> |
| 4003 |
|
|
| 4004 |
|
|
| 4005 |
|
|
| 4006 |
|
|
| 4007 |
<sect id="shellscript"><heading>Shell Script</heading> |
<sect id="shellscript"><heading>Shell Script</heading> |
| 4008 |
|
|
| 4009 |
<P>***** Not written yet *****</P> |
<P>***** Not written yet *****</P> |
| 4121 |
Note that both coded character sets (for example, KS_C_5601-1987, |
Note that both coded character sets (for example, KS_C_5601-1987, |
| 4122 |
MIBenum 36) and encodings (for example, ISO-2022-KR, MIBenum: 37) |
MIBenum 36) and encodings (for example, ISO-2022-KR, MIBenum: 37) |
| 4123 |
are registered. How confusing! |
are registered. How confusing! |
| 4124 |
|
<item> |
| 4125 |
|
<url id="http://www.itscj.ipsj.or.jp/ISO-IR/" |
| 4126 |
|
name="International Register of Coded Character Sets"> |
| 4127 |
|
A complete list of registered CCS, with ISO 2022 escape sequences. |
| 4128 |
|
PDF files for these CCS are also available. |
| 4129 |
</list> |
</list> |
| 4130 |
Characters (ISO 8859) |
Characters (ISO 8859) |
| 4131 |
<list> |
<list> |
| 4132 |
<item> |
<item> |
| 4133 |
<url id="http://czyborra.com/charsets/iso8859.html"> |
<url id="http://czyborra.com/charsets/iso8859.html" |
| 4134 |
|
name="ISO 8859 Alphabet Soup"> |
| 4135 |
<item> |
<item> |
| 4136 |
<url id="http://park.kiev.ua/multiling/ml-docs/iso-8859.html"> |
<url id="http://park.kiev.ua/multiling/ml-docs/iso-8859.html" |
| 4137 |
<item> |
name="ISO 8859 Character Sets"> |
|
<url id="http://www.terena.nl/projects/multiling/ml-docs/iso-8859.html"> |
|
| 4138 |
</list> |
</list> |
| 4139 |
Characters (ISO 2022) |
Characters (ISO 2022) |
| 4140 |
<list> |
<list> |
| 4146 |
Characters (ISO 10646 and Unicode) |
Characters (ISO 10646 and Unicode) |
| 4147 |
<list> |
<list> |
| 4148 |
<item><url id="http://www.unicode.org/" name="Unicode Consortium"> |
<item><url id="http://www.unicode.org/" name="Unicode Consortium"> |
| 4149 |
|
<item> |
| 4150 |
|
<url id="http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html" |
| 4151 |
|
name="Problems and Solutions for Unicode and User/Vendor Defined |
| 4152 |
|
Characters"> |
| 4153 |
</list> |
</list> |
| 4154 |
</P> |
</P> |
| 4155 |
|
|
| 4204 |
<url id="http://clisp.cons.org/~haible/packages-libutf8.html" |
<url id="http://clisp.cons.org/~haible/packages-libutf8.html" |
| 4205 |
name="libutf8 - a Unicode/UTF-8 locale plugin"> provides |
name="libutf8 - a Unicode/UTF-8 locale plugin"> provides |
| 4206 |
UTF-8 locale support for systems which don't have UTF-8 locales. |
UTF-8 locale support for systems which don't have UTF-8 locales. |
| 4207 |
|
<item> |
| 4208 |
|
<url id="http://www.pango.org/" name="Pango"> is a project to |
| 4209 |
|
develop a portable high-quality text rendering engine. |
| 4210 |
</list> |
</list> |
| 4211 |
</P> |
</P> |
| 4212 |
|
|