/[ddp]/manuals/trunk/intro-i18n/cyrillic.sgml
ViewVC logotype

Contents of /manuals/trunk/intro-i18n/cyrillic.sgml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1115 - (show annotations) (download) (as text)
Sun Mar 4 10:24:27 2001 UTC (12 years, 2 months ago) by kubota
File MIME type: text/x-sgml
File size: 2328 byte(s)
Added a chapter on Cyrillic languages (by Alexander Voropay).
1
2 <sect id="cyrillic"><heading>Languages with Cyrillic script</heading>
3
4
5
6 <P>
7 Section written by
8 Alexander Voropay <email>a.voropay@globalone.ru</email>.
9 </P>
10
11 <P>
12 First of all, there are a lot of languages with Cyrillic script.
13 </P>
14
15 <P>
16 Slavic languages : Russian (ru), Ukrainian (uk), Belarussian (be),
17 Bulgarian (bg), Serbian (sr), and Macedonian (mk).
18 </P>
19
20 <P>
21 Another Slavic languages (Polish(pl), Czech(cz), Croatian(hr)) uses
22 Latin script : mainly ISO-8859-2 (Central-European).
23 </P>
24
25 <P>
26 During USSR time some non-slavic languages got own alpabets, based
27 on modifyed cyrillic characters. Azerbaijani (az), Turkmen (tk), Kurdish (ku),
28 Uzbek (uz), Kazakh (kk), Kirghiz (ky), Tajik (tg) and Mongolian (mn)
29 Komi (kv) e.t.c.
30 <list>
31 <item><url id="http://www.peoples.org.ru/eng_index.html">
32 <item><url id="http://www-hep.fzu.cz/~piska/">
33 <item><url id="http://www.srpsko-pismo.org/">
34 <item><url id="http://www.hr/hrvatska/language/CroLang.html">
35 <item><url id="http://ftp.fi.muni.cz/pub/localization/charsets/cs-encodings-faq">
36 </list>
37 </P>
38
39 <P>
40 UNICODE has rich Cyrillic section.
41 </P>
42
43 <P>
44 Ufortunately, there are a lot of 8-bit Cyrillic Charsets. There is no
45 one universal 8-bit Cyrillic charset, because, for example, there
46 are about 260 Cyrillic characters in
47 <url
48 id="http://partners.adobe.com/asn/developer/PDFS/TN/5013.Cyrillic_Font_Spec.pdf"
49 name="Adobe Glyph List">.
50 </P>
51
52 <P>
53 The overview "<url id="http://czyborra.com/charsets/cyrillic.html"
54 name="The Cyrillic Charset Soup">".
55 </P>
56
57 <P>
58 The main problem with Russian : there are at least six live Charsets:
59 <list>
60 <item>KOI8-R
61 <item>Windows-1251
62 <item>CP-866
63 <item>ISO-8859-5
64 <item>MAC-CYRILLIC
65 <item>ISO-IR-111
66 </list>
67 So, Russian computers really live in "Charset mix", like Japanese :
68 Shift-JIS, ISO2022-JP, EUC-JP. You can get e-mail in any charset,
69 so your Mail Agent should understand all this charsets. Takasiganai.
70 </P>
71
72 <P>
73 In POSIX environment you should setup FULL locale name (with
74 .Charset field) :
75 <example>
76 LANG=ru_RU.KOI8-R
77 LANG=ru_RU.ISO_8859-5
78 LANG=ru_RU.CP1251
79 </example>
80 e.t.c. for proper sorting, character classification and for readable
81 messages. Any form of abbreviations ("<tt>ru</tt>", "<tt>ru_RU</tt>"
82 e.t.c.) are sourse of misunderstanding.
83 I hope, Unicode <tt>LANG=ru_RU.UTF-8</tt> will save
84 us in near future...
85 </P>
86

  ViewVC Help
Powered by ViewVC 1.1.5