user-ja -- two sets of messages in ASCII and native codeset in the same language

The author of this section is Tomohiro KUBOTA (kubota@debian.or.jp).

Introduction

user-ja is a Debian-specific software which establishes basic settings for Japanese-speaking beginners. user-ja does not automatically establishes the settings. A user who needs Japanese environment has to invoke user-ja-conf.

Since user-ja-conf is a software to establish Japanese environment, the environment where user-ja runs may be poor Japanese environment. For example, user-ja-conf must not assume that Japanese character can be displayed. However, Japanese character should be used in environments where it is possible.

user-ja is a simple example which switches two sets of messages, one is written using ASCII characters and the other Japanese characters. Note that both of them are written in Japanese language. This is beyond what gettext can do.

Though user-ja is a Japanese-specific software, this problem of ability to display non-ASCII character is common to non-ASCII languages.

Strategy

The following environments can display Japanese characters: kon (Kanji Console), kterm, and krxvt (in rxvt-ml package). And more, telnet softwares for Windows and so on may be able to display Japanese characters.

At first, user-ja-conf detects the environment. If it can display Japanese characters, go ahead. If not, try to establish a new environment and invoke itself in it. If detection is failed, display Japanese characters and ask the user whether he/she can read it.

Implementation

user-ja-conf is a perl script. Here shows a function which check whether Japanese native characters can be displayed or not and try to establish an environment where native characters can be displayed, if not. sub isNC($$) { my ($KANJI, $TTY, $TERM, $DISPLAY, $WHICH); $TTY = `/usr/bin/tty`; $TERM = $ENV{TERM}; $DISPLAY = $ENV{DISPLAY}; $WHICH = '/usr/bin/which'; $THIS = $_[0]; $OPT = $_[1]; if ($TERM eq 'kon' || $TERM eq 'kterm') { $KANJI=1; } elsif ($DISPLAY ne '' && system("$WHICH kterm >/dev/null")==0) { exec("kterm -km euc -e $THIS $OPT"); } elsif ($DISPLAY ne '' && system("$WHICH krxvt >/dev/null")==0) { exec("krxvt -km euc -e $THIS $OPT"); } else { print STDERR &sourceset2displayset( "Japanese sentence in Japanese characters 'Can you read this sentence?'\n"); print STDERR "Japanese sentence in ASCII characters 'Can you read the above sentence written in Kanji? [y/N] "; $a = <>; if ($a =~ /y|Y/) { $KANJI=1; } elsif ($TTY =~ m#/dev/tty[0-9]+#) { print STDERR "Japanese sentence in ASCII characters 'Shall I invoke \'KON\'? [Y/n] "; $a = <>; exec("kon -e $THIS $OPT") if ($a !~ /n|N/); $KANJI=0; } else { $KANJI=0; } } $KANJI; }

&sourceset2displayset($) is a function to convert a string from codeset for source code into codeset for display. This is needed because codeset for program source (in this case, perl script) and dotfiles may be different. There are three popular codesets for Japanese --- ISO-2022-JP, EUC-JP, and SHIFT-JIS. EUC-JP should be used for perl source code because all non-ASCII characters in EUC-JP do not have values in 0x21 - 0x7e. However, ISO-2022-JP is the safest codeset to display because EUC-JP and SHIFT-JIS have to be used exclusively. However, ISO-2022-JP is the most difficult codeset to implement and there may be a terminal environment which does not understand ISO-2022-JP (for example, Minicom). On the other hand, dotfiles may be written in any codesets, according to one's favorite and purpose.

The following function is prepared to display messages in appropriate codeset. Don't care 'Lang::' package. sub disp ($$) { if ($NC) {print STDERR &Lang::sourceset2displayset($_[1]);} else {print STDERR $_[0];} }

This is an example how the disp function is used. sub disp_finish() { &Sub::disp(<<EOF1,<<EOF2); [Enter] key WO OSUTO KONO user-ja-conf HA SYUURYOU SHIMASU. EOF1 Japanese sentence in Japanese characters 'Push [Enter] key to finish.' EOF2 } Here the sentence '[Enter] key WO OSUTO...' is the Latin alphabet expression of Japanese.

Thus almost all messages are duplicated using disp function.