| 1 |
|
| 2 |
<sect id="spanish"><heading>Spanish language / used in Spain,
|
| 3 |
most of America and Equatorial Guinea</heading>
|
| 4 |
|
| 5 |
|
| 6 |
|
| 7 |
<P>
|
| 8 |
Section written by
|
| 9 |
Eusebio C Rufian-Zilbermann <email>eusebio@acm.org</email>.
|
| 10 |
</P>
|
| 11 |
|
| 12 |
<P>
|
| 13 |
Spanish is one of the official languages in Spain,
|
| 14 |
the official language in most of the countries in the American
|
| 15 |
continent and the official language in Equatorial Guinea.
|
| 16 |
It is spoken in many other regions where it is not the
|
| 17 |
official language. Other official languages in Spain are
|
| 18 |
Galician, Catalan and Basque. These other languages each
|
| 19 |
have their own specific issues with regards to Localization.
|
| 20 |
They are not described in this section of the document.
|
| 21 |
</P>
|
| 22 |
|
| 23 |
<P>
|
| 24 |
The Spanish Language derives from the variation spoken in
|
| 25 |
the Castille region. The term Castillian is sometimes used
|
| 26 |
to refer to the Spanish language (particularly when an
|
| 27 |
author wants to stress the fact that there are other
|
| 28 |
languages spoken in Spain). Both Castillian and Spanish
|
| 29 |
language refer to the same language, they are not
|
| 30 |
different things.
|
| 31 |
</P>
|
| 32 |
|
| 33 |
|
| 34 |
<sect1 id="spanish-character"><heading>Characters used in Spanish</heading>
|
| 35 |
|
| 36 |
<P>
|
| 37 |
Spanish uses a Latin alphabet. The numerical characters
|
| 38 |
used in Spanish are the Arabic numerals.
|
| 39 |
</P>
|
| 40 |
|
| 41 |
<P>
|
| 42 |
The character that distinguishes Spanish from other Latin alphabets
|
| 43 |
is the Ñ ('N' with tilde), which exists in uppercase and
|
| 44 |
lowercase versions. Vowels in Spanish may have a mark (the accent)
|
| 45 |
on top of them to indicate intensity intonation. This accent is
|
| 46 |
required for orthography (written correctness) on lowercase vowels
|
| 47 |
but it is optional in uppercase vowels. The letter 'u' may have a
|
| 48 |
dieresis (like the German umlaut), both in uppercase and
|
| 49 |
lowercase forms.
|
| 50 |
</P>
|
| 51 |
|
| 52 |
<P>
|
| 53 |
Some punctuation signs are characteristic of the Spanish
|
| 54 |
language. The opening question mark and the opening
|
| 55 |
exclamation sign look like the English question mark and
|
| 56 |
exclamation sign rotated 180 degrees. The English question
|
| 57 |
mark and exclamation sign are referred to as closing question
|
| 58 |
mark and exclamation sign. The small underlined 'a' and 'o' are
|
| 59 |
used mainly for ordinal numbers, similar to the small 'th'
|
| 60 |
in English ordinals.
|
| 61 |
</P>
|
| 62 |
|
| 63 |
<sect1 id="spanish-sets"><heading>Character Sets</heading>
|
| 64 |
|
| 65 |
<P>
|
| 66 |
UNE (Una Norma Española) is the National
|
| 67 |
Standards Organization in Spain. UNE is a member of the ISO and
|
| 68 |
standards that have one-to-one correspondence are usually
|
| 69 |
called by their ISO number, rather than their UNE number.
|
| 70 |
</P>
|
| 71 |
|
| 72 |
<P>
|
| 73 |
ISO 8859-1, also known as ISO Latin-1, contains the characters
|
| 74 |
required for Spanish.
|
| 75 |
</P>
|
| 76 |
|
| 77 |
<sect1 id="spanish-codesets"><heading>Codesets</heading>
|
| 78 |
|
| 79 |
<P>
|
| 80 |
The codeset mostly used for Spanish is ISO 8859-1.
|
| 81 |
The codepage Windows 1252 a.k.a. Windows Latin-1 is a
|
| 82 |
superset of ISO 8859-1 that adds some characters in the
|
| 83 |
range 128 to 159. Other codesets are Unicode, Macintosh Roman
|
| 84 |
(codepage 1000), MS-DOS Latin-1 (codepage 850) or less
|
| 85 |
frequently MS-DOS Latin US (codepage 437) which contains
|
| 86 |
accented lowercase characters but not uppercase. Some
|
| 87 |
additional Latin codesets are EBCDIC CP500 and CP 1026
|
| 88 |
(used in IBM mainframes and terminal emulators),
|
| 89 |
Adobe Standard (used as default for Postscript documents),
|
| 90 |
Nextstep Latin, HP Roman 8 (for HPUX and Laserjet resident
|
| 91 |
printer fonts) and the Latin codepage in OS/2.
|
| 92 |
They are all stateless, 8-bit codepages (with the
|
| 93 |
exception of Unicode that is 16-bit).
|
| 94 |
</P>
|
| 95 |
|
| 96 |
<sect1 id="spanish-how"><heading>How These Codesets Are Used --- Information for Programmers</heading>
|
| 97 |
|
| 98 |
<P>
|
| 99 |
In most cases it is safe to use ISO 8859-1 characters. Some exceptions are
|
| 100 |
<list>
|
| 101 |
<item>WWW browsers should recognize all codesets.
|
| 102 |
<item>Software which communicates with IBM mainframes, Macintosh,
|
| 103 |
MS-DOS, Nextstep, HPUX, OS/2 should handle the
|
| 104 |
corresponding encoding.
|
| 105 |
<item>File names for Joliet-format CD-ROM used for Windows is
|
| 106 |
written in Unicode.
|
| 107 |
<item>Postscript interpreters should handle the Adobe Standard character set.
|
| 108 |
<item>Printer filters or drivers for HP printers should handle the
|
| 109 |
Roman-8 character set if using the internal fonts.
|
| 110 |
</list>
|
| 111 |
</P>
|
| 112 |
|
| 113 |
<sect1 id="spanish-columns"><heading>Columns</heading>
|
| 114 |
|
| 115 |
<P>
|
| 116 |
On console displays, each character occupies one column. Printed
|
| 117 |
text can be equally spaced (one column per character) or
|
| 118 |
proportionally spaced (a character can occupy fractionally
|
| 119 |
more or less than a column, depending on its shape).
|
| 120 |
</P>
|
| 121 |
|
| 122 |
<P>
|
| 123 |
Note: Even when using Traditional Sorting, ch and ll occupy
|
| 124 |
two columns. See the comment on Traditional sorting in
|
| 125 |
<ref id="spanish-sort">.
|
| 126 |
</P>
|
| 127 |
|
| 128 |
<sect1 id="spanish-direction"><heading>Writing Direction</heading>
|
| 129 |
|
| 130 |
<P>
|
| 131 |
Spanish is normally written in left to right lines arranged
|
| 132 |
from top to bottom of the page. For artistic purposes it might
|
| 133 |
be written in top to bottom columns arranged left to right
|
| 134 |
within the page. This columnar arrangement would be expected
|
| 135 |
only in graphic and charting programs (e.g., a drawing program,
|
| 136 |
a spreadsheet graph or a page layout program for composing
|
| 137 |
brochures) but regular text editors wouldn't be expected to
|
| 138 |
implement this style.
|
| 139 |
</P>
|
| 140 |
|
| 141 |
<sect1 id="spanish-layout"><heading>Layout of Characters</heading>
|
| 142 |
|
| 143 |
<P>
|
| 144 |
In the Spanish language, words are separated by spaces
|
| 145 |
and a line can be broken at a space, a punctuation sign or
|
| 146 |
a hyphenated word.
|
| 147 |
</P>
|
| 148 |
|
| 149 |
<P>
|
| 150 |
There are several sets of paired characters in Spanish.
|
| 151 |
Unlike English, question marks and exclamation signs are
|
| 152 |
also paired. Other paired characters are the same as English
|
| 153 |
(parenthesis, square brackets, and so forth). Opening
|
| 154 |
characters shouldn't appear at the end of a line.
|
| 155 |
Closing characters and punctuation signs such as period and
|
| 156 |
comma shouldn't appear at the beginning of a line.
|
| 157 |
</P>
|
| 158 |
|
| 159 |
<P>
|
| 160 |
Words can be broken at a syllabus and hyphenated. Unlike
|
| 161 |
English, syllabi in Spanish end in a vowel more often than
|
| 162 |
in a consonant. Syllabi that end in a consonant letter are
|
| 163 |
typically at the end of a word or followed by a syllabus
|
| 164 |
that starts with another consonant. Anyway, the rules are
|
| 165 |
not completely consistent and a hyphenation
|
| 166 |
dictionary has to be used.
|
| 167 |
</P>
|
| 168 |
|
| 169 |
<sect1 id="spanish-lang"><heading>LANG variable</heading>
|
| 170 |
|
| 171 |
<P>
|
| 172 |
For Bash
|
| 173 |
<example>
|
| 174 |
set meta-flag on # keep all 8 bits for keyboard input
|
| 175 |
set output-meta on # keep all 8 bits for terminal output
|
| 176 |
set convert-meta off # don't convert escape sequences
|
| 177 |
export LC_CTYPE=ISO_8859_1
|
| 178 |
</example>
|
| 179 |
</P>
|
| 180 |
|
| 181 |
<P>
|
| 182 |
For Tcsh
|
| 183 |
<example>
|
| 184 |
setenv LANG C
|
| 185 |
setenv LC_CTYPE "iso_8859_1"
|
| 186 |
</example>
|
| 187 |
</P>
|
| 188 |
|
| 189 |
<sect1 id="spanish-input"><heading>Input from Keyboard</heading>
|
| 190 |
|
| 191 |
<P>
|
| 192 |
For the Spanish keyboard to work correctly, you need the command
|
| 193 |
<tt>loadkeys /usr/lib/kbd/keytables/es.map</tt> in the corresponding
|
| 194 |
startup (rc) file.
|
| 195 |
</P>
|
| 196 |
|
| 197 |
<P>
|
| 198 |
Most of the Spanish characters are input from the keyboard
|
| 199 |
with a single stroke. A two-key combination is used for accent
|
| 200 |
and dieresis marks above vowels. Traditional typewriter
|
| 201 |
machines used a 'dead key' system with keys that would
|
| 202 |
strike the paper without advancing the carriage to the
|
| 203 |
next character. Typing on a computer keyboard simulates
|
| 204 |
this behavior, typing the accent or dieresis key does
|
| 205 |
not produce any visible output until a vowel is typed
|
| 206 |
afterwards. Usually if the accent or dieresis key is followed
|
| 207 |
by a consonant, the accent key is ignored. Accented or
|
| 208 |
dieresis characters cannot be used for shortcut keys
|
| 209 |
for selecting options.
|
| 210 |
</P>
|
| 211 |
|
| 212 |
<P>
|
| 213 |
The words for Yes and No are Sí (the character next to S is 'i'
|
| 214 |
with acute accent) and No. We would commonly use the S and N
|
| 215 |
keys for a Sí/No choice.
|
| 216 |
</P>
|
| 217 |
|
| 218 |
<P>
|
| 219 |
Spanish keyboards usually allow for typing not only the Spanish
|
| 220 |
accent signs, but also the accent signs in French and other
|
| 221 |
languages (grave accent, circumflex accent, umlaut on letters
|
| 222 |
other than the u). Other character that is typically available
|
| 223 |
is the cedilla C (that looks like a C with a comma underneath,
|
| 224 |
used for Catalan, Portuguese and French words, for example).
|
| 225 |
There is a Latin-American keyboard layout that does not contain
|
| 226 |
the grave accent and the cedilla C.
|
| 227 |
</P>
|
| 228 |
|
| 229 |
<sect1 id="spanish-more"><heading>More Detailed Discussions</heading>
|
| 230 |
|
| 231 |
<sect2 id="spanish-sort"><heading>Sorting</heading>
|
| 232 |
|
| 233 |
<P>
|
| 234 |
Traditional Spanish considered the combinations CH and LL individual
|
| 235 |
single letters. For usage in computers, this required an additional
|
| 236 |
effort for sorting and character counting algorithms. It was
|
| 237 |
decided that the savings in not requiring special algorithms
|
| 238 |
was significant enough and that it would be acceptable to treat
|
| 239 |
them as 2 separate letters. Some software that already had
|
| 240 |
incorporated the special sorting algorithms now allows for
|
| 241 |
choosing between 'Traditional Spanish Sort' and 'Modern Spanish Sort'.
|
| 242 |
</P>
|
| 243 |
|
| 244 |
<P>
|
| 245 |
Accents and dieresis are ignored for sorting purposes. The
|
| 246 |
only exception is the rare case where two words are exactly
|
| 247 |
the same and the accent is the only difference, the word with
|
| 248 |
the unaccented character should be sorted first. E.g.,
|
| 249 |
camión (c-a-m-i-o with acute accent-n), camionero,
|
| 250 |
este, éste (e with acute accent-s-t-e).
|
| 251 |
</P>
|
| 252 |
|
| 253 |
<P>
|
| 254 |
The ñ (n with tilde) is always sorted after the n and
|
| 255 |
before the l. It cannot be intermixed with the n.
|
| 256 |
</P>
|
| 257 |
|
| 258 |
<sect2 id="spanish-number"><heading>Number format, date and
|
| 259 |
currency symbols</heading>
|
| 260 |
|
| 261 |
<P>
|
| 262 |
The use of the dot and the comma as a thousands separator and
|
| 263 |
for decimal places is usually the opposite of US English.
|
| 264 |
E.g., 1.000,00 instead of 1,000.00. Some Spanish-speaking countries,
|
| 265 |
notably Mexico, follow the same standards as the US. It is
|
| 266 |
desirable that programs can handle both forms as an independent setting.
|
| 267 |
</P>
|
| 268 |
|
| 269 |
<P>
|
| 270 |
The usual date format is DD-MM-YYYY rather than MM-DD-YYYY, but
|
| 271 |
again this depends on the specific country. It is desirable to have
|
| 272 |
the date format as a configurable parameter.
|
| 273 |
</P>
|
| 274 |
|
| 275 |
<P>
|
| 276 |
The currency symbol can be prepended or appended to the number
|
| 277 |
and it can be one or several characters long. E.g., 100 PTA for
|
| 278 |
Spanish pesetas or N$ 100 for Mexican pesos. It is desirable that
|
| 279 |
the symbol and position can be individually defined and to allow
|
| 280 |
for currency symbols longer than 1-character.
|
| 281 |
</P>
|
| 282 |
|
| 283 |
<sect2 id="spanish-varieties"><heading>Varieties of Spanish</heading>
|
| 284 |
|
| 285 |
<P>
|
| 286 |
Spanish is spoken by a tremendous variety of people. Academics
|
| 287 |
through the different Spanish-speaking countries realized
|
| 288 |
that this could lead to a dismemberment of the language
|
| 289 |
and founded the Academy of the Spanish Language. This
|
| 290 |
academy has branches in most of the Spanish-speaking
|
| 291 |
countries, there is a Royal Academy of the Spanish Language
|
| 292 |
of Spain, an Academy of the Spanish Language of Mexico,
|
| 293 |
et cetera. The members of this Academy study the local
|
| 294 |
evolution of the languages in each country. They meet
|
| 295 |
together to maintain a body of knowledge of what should
|
| 296 |
be considered the Standard Spanish Language and what should
|
| 297 |
be considered local or regional terms and slang terms.
|
| 298 |
</P>
|
| 299 |
|
| 300 |
<P>
|
| 301 |
In most cases, software can use terms that are within the
|
| 302 |
Standard set by the Academy. When new terms appear (e.g.,
|
| 303 |
when a new product is created that has no previous name in
|
| 304 |
the Spanish language) each region typically starts using a
|
| 305 |
new word. When there is one or two terms that become the
|
| 306 |
de-facto standard, the Academy would incorporate the new
|
| 307 |
term into the Standard. This is a very slow process and
|
| 308 |
there will be temporary usages in different regions within
|
| 309 |
the Spanish-speaking worlds that conflict with each other.
|
| 310 |
Some people speak about Spain-Spanish and American-Spanish
|
| 311 |
but most of the time it doesn't really make sense to make
|
| 312 |
this distinction. First of all, even within America, there
|
| 313 |
are differences between the local varieties that may be greater
|
| 314 |
than the differences with Spain itself. E.g., Spanish as
|
| 315 |
spoken in Mexico, Colombia and Argentina may have between
|
| 316 |
them as much differences as each of them when compared to
|
| 317 |
how it is spoken in Spain. A computer user in Ecuador may
|
| 318 |
feel more comfortable overall with the terms used in Spain
|
| 319 |
than with the terms used in Mexico (and of course, most
|
| 320 |
comfortable with the terms used in Ecuador itself!). The
|
| 321 |
options are to either produce one Spanish version of a
|
| 322 |
software product that is an acceptable compromise (maybe
|
| 323 |
not perfect) for all Spanish-speaking countries or to
|
| 324 |
produce multiple versions to account for all the regional
|
| 325 |
variations.
|
| 326 |
</P>
|
| 327 |
|
| 328 |
<P>
|
| 329 |
A plea to all the people who are localizing software into
|
| 330 |
Spanish: Let's use our efforts judiciously and create one
|
| 331 |
Spanish version and not many. Let's strive for a version
|
| 332 |
that conforms to the Standards and that can be as widely
|
| 333 |
accepted as possible for the areas not covered by the
|
| 334 |
Standards. Wouldn't you rather have a new product translated,
|
| 335 |
instead of two versions of a product where one matches your
|
| 336 |
local variety of the language?
|
| 337 |
</P>
|
| 338 |
|