| 1 |
|
| 2 |
<sect id="cyrillic"><heading>Languages with Cyrillic script</heading>
|
| 3 |
|
| 4 |
|
| 5 |
|
| 6 |
<P>
|
| 7 |
Section written by
|
| 8 |
Alexander Voropay <email>a.voropay@globalone.ru</email>.
|
| 9 |
</P>
|
| 10 |
|
| 11 |
<P>
|
| 12 |
First of all, there are a lot of languages with Cyrillic script.
|
| 13 |
</P>
|
| 14 |
|
| 15 |
<P>
|
| 16 |
Slavic languages : Russian (ru), Ukrainian (uk), Belarussian (be),
|
| 17 |
Bulgarian (bg), Serbian (sr), and Macedonian (mk).
|
| 18 |
</P>
|
| 19 |
|
| 20 |
<P>
|
| 21 |
Another Slavic languages (Polish(pl), Czech(cz), Croatian(hr)) uses
|
| 22 |
Latin script : mainly ISO-8859-2 (Central-European).
|
| 23 |
</P>
|
| 24 |
|
| 25 |
<P>
|
| 26 |
During USSR time some non-slavic languages got own alpabets, based
|
| 27 |
on modifyed cyrillic characters. Azerbaijani (az), Turkmen (tk), Kurdish (ku),
|
| 28 |
Uzbek (uz), Kazakh (kk), Kirghiz (ky), Tajik (tg) and Mongolian (mn)
|
| 29 |
Komi (kv) e.t.c.
|
| 30 |
<list>
|
| 31 |
<item><url id="http://www.peoples.org.ru/eng_index.html">
|
| 32 |
<item><url id="http://www-hep.fzu.cz/~piska/">
|
| 33 |
<item><url id="http://www.srpsko-pismo.org/">
|
| 34 |
<item><url id="http://www.hr/hrvatska/language/CroLang.html">
|
| 35 |
<item><url id="http://ftp.fi.muni.cz/pub/localization/charsets/cs-encodings-faq">
|
| 36 |
</list>
|
| 37 |
</P>
|
| 38 |
|
| 39 |
<P>
|
| 40 |
UNICODE has rich Cyrillic section.
|
| 41 |
</P>
|
| 42 |
|
| 43 |
<P>
|
| 44 |
Ufortunately, there are a lot of 8-bit Cyrillic Charsets. There is no
|
| 45 |
one universal 8-bit Cyrillic charset, because, for example, there
|
| 46 |
are about 260 Cyrillic characters in
|
| 47 |
<url
|
| 48 |
id="http://partners.adobe.com/asn/developer/PDFS/TN/5013.Cyrillic_Font_Spec.pdf"
|
| 49 |
name="Adobe Glyph List">.
|
| 50 |
</P>
|
| 51 |
|
| 52 |
<P>
|
| 53 |
The overview "<url id="http://czyborra.com/charsets/cyrillic.html"
|
| 54 |
name="The Cyrillic Charset Soup">".
|
| 55 |
</P>
|
| 56 |
|
| 57 |
<P>
|
| 58 |
The main problem with Russian : there are at least six live Charsets:
|
| 59 |
<list>
|
| 60 |
<item>KOI8-R
|
| 61 |
<item>Windows-1251
|
| 62 |
<item>CP-866
|
| 63 |
<item>ISO-8859-5
|
| 64 |
<item>MAC-CYRILLIC
|
| 65 |
<item>ISO-IR-111
|
| 66 |
</list>
|
| 67 |
So, Russian computers really live in "Charset mix", like Japanese :
|
| 68 |
Shift-JIS, ISO2022-JP, EUC-JP. You can get e-mail in any charset,
|
| 69 |
so your Mail Agent should understand all this charsets. Takasiganai.
|
| 70 |
</P>
|
| 71 |
|
| 72 |
<P>
|
| 73 |
In POSIX environment you should setup FULL locale name (with
|
| 74 |
.Charset field) :
|
| 75 |
<example>
|
| 76 |
LANG=ru_RU.KOI8-R
|
| 77 |
LANG=ru_RU.ISO_8859-5
|
| 78 |
LANG=ru_RU.CP1251
|
| 79 |
</example>
|
| 80 |
e.t.c. for proper sorting, character classification and for readable
|
| 81 |
messages. Any form of abbreviations ("<tt>ru</tt>", "<tt>ru_RU</tt>"
|
| 82 |
e.t.c.) are sourse of misunderstanding.
|
| 83 |
I hope, Unicode <tt>LANG=ru_RU.UTF-8</tt> will save
|
| 84 |
us in near future...
|
| 85 |
</P>
|
| 86 |
|