/[ddp]/manuals/trunk/quick-reference/asciidoc/11_dataconv.txt
ViewVC logotype

Contents of /manuals/trunk/quick-reference/asciidoc/11_dataconv.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 8989 - (show annotations) (download)
Tue Nov 29 16:05:18 2011 UTC (17 months, 3 weeks ago) by osamu
File MIME type: text/plain
File size: 52596 byte(s)
fix missing package
1 == Data conversion
2
3 // vim: set sts=2 expandtab:
4 // Use ":set nowrap" to edit table
5
6 Tools and tips for converting data formats on the Debian system are described.
7
8 Standard based tools are in very good shape but support for proprietary data formats are limited.
9
10 === Text data conversion tools
11
12 Following packages for the text data conversion caught my eyes.
13
14 .List of text data conversion tools
15 [grid="all"]
16 `----------`-------------`------------`-----------`-----------------------------------------------------------------------------------
17 package popcon size keyword description
18 --------------------------------------------------------------------------------------------------------------------------------------
19 `libc6` @-@popcon1@-@ @-@psize1@-@ charset text encoding converter between locales by `iconv`(1) (fundamental)
20 `recode` @-@popcon1@-@ @-@psize1@-@ charset+eol text encoding converter between locales (versatile, more aliases and features)
21 `konwert` @-@popcon1@-@ @-@psize1@-@ charset text encoding converter between locales (fancy)
22 `nkf` @-@popcon1@-@ @-@psize1@-@ charset character set translator for Japanese
23 `tcs` @-@popcon1@-@ @-@psize1@-@ charset character set translator
24 `unaccent` @-@popcon1@-@ @-@psize1@-@ charset replace accented letters by their unaccented equivalent
25 `tofrodos` @-@popcon1@-@ @-@psize1@-@ eol text format converter between DOS and Unix: `fromdos`(1) and `todos`(1)
26 `macutils` @-@popcon1@-@ @-@psize1@-@ eol text format converter between Macintosh and Unix: `frommac`(1) and `tomac`(1)
27 --------------------------------------------------------------------------------------------------------------------------------------
28
29 ==== Converting a text file with iconv
30
31 TIP: `iconv`(1) is provided as a part of the `libc6` package and it is always available on practically all systems to convert the encoding of characters.
32
33 You can convert encodings of a text file with `iconv`(1) by the following.
34
35 --------------------
36 $ iconv -f encoding1 -t encoding2 input.txt >output.txt
37 --------------------
38
39 Encoding values are case insensitive and ignore "`-`" and "`_`" for matching. Supported encodings can be checked by the "`iconv -l`" command.
40
41 [[list-of-encoding-values]]
42 .List of encoding values and their usage
43 [grid="all"]
44 `---------------------------------------------------------`------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
45 encoding value usage
46 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
47 http://en.wikipedia.org/wiki/ASCII[ASCII]. http://en.wikipedia.org/wiki/ASCII[American Standard Code for Information Interchange], 7 bit code w/o accented characters
48 http://en.wikipedia.org/wiki/UTF-8[UTF-8] current multilingual standard for all modern OSs
49 http://en.wikipedia.org/wiki/ISO/IEC_8859-1[ISO-8859-1] old standard for western European languages, ASCII + accented characters
50 http://en.wikipedia.org/wiki/ISO/IEC_8859-2[ISO-8859-2] old standard for eastern European languages, ASCII + accented characters
51 http://en.wikipedia.org/wiki/ISO/IEC_8859-15[ISO-8859-15] old standard for western European languages, http://en.wikipedia.org/wiki/ISO/IEC_8859-1[ISO-8859-1] with euro sign
52 http://en.wikipedia.org/wiki/Code_page_850[CP850] code page 850, Microsoft DOS characters with graphics for western European languages, http://en.wikipedia.org/wiki/ISO/IEC_8859-1[ISO-8859-1] variant
53 http://en.wikipedia.org/wiki/Code_page_932[CP932] code page 932, Microsoft Windows style http://en.wikipedia.org/wiki/Shift_JIS[Shift-JIS] variant for Japanese
54 http://en.wikipedia.org/wiki/Code_page_936[CP936] code page 936, Microsoft Windows style http://en.wikipedia.org/wiki/GB2312[GB2312], http://en.wikipedia.org/wiki/GBK[GBK] or http://en.wikipedia.org/wiki/GB18030[GB18030] variant for Simplified Chinese
55 http://en.wikipedia.org/wiki/Code_page_949[CP949] code page 949, Microsoft Windows style http://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-KR[EUC-KR] or Unified Hangul Code variant for Korean
56 http://en.wikipedia.org/wiki/Code_page_950[CP950] code page 950, Microsoft Windows style http://en.wikipedia.org/wiki/Big5[Big5] variant for Traditional Chinese
57 http://en.wikipedia.org/wiki/Windows-1251[CP1251] code page 1251, Microsoft Windows style encoding for the Cyrillic alphabet
58 http://en.wikipedia.org/wiki/Windows-1252[CP1252] code page 1252, Microsoft Windows style http://en.wikipedia.org/wiki/ISO/IEC_8859-15[ISO-8859-15] variant for western European languages
59 http://en.wikipedia.org/wiki/KOI8-R[KOI8-R] old Russian UNIX standard for the Cyrillic alphabet
60 http://en.wikipedia.org/wiki/ISO/IEC_2022[ISO-2022-JP] standard encoding for Japanese email which uses only 7 bit codes
61 http://en.wikipedia.org/wiki/Extended_Unix_Code[eucJP] old Japanese UNIX standard 8 bit code and completely different from http://en.wikipedia.org/wiki/Shift_JIS[Shift-JIS]
62 http://en.wikipedia.org/wiki/Shift_JIS[Shift-JIS] JIS X 0208 Appendix 1 standard for Japanese (see http://en.wikipedia.org/wiki/Code_page_932[CP932])
63 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
64
65 NOTE: Some encodings are only supported for the data conversion and are not used as locale values (<<_basics_of_encoding>>).
66
67 For character sets which fit in single byte such as http://en.wikipedia.org/wiki/ASCII[ASCII] and http://en.wikipedia.org/wiki/ISO/IEC_8859[ISO-8859] character sets, the http://en.wikipedia.org/wiki/Character_encoding[character encoding] means almost the same thing as the character set.
68
69 For character sets with many characters such as http://en.wikipedia.org/wiki/JIS_X_0213[JIS X 0213] for Japanese or http://en.wikipedia.org/wiki/Universal_Character_Set[Universal Character Set (UCS, Unicode, ISO-10646-1)] for practically all languages, there are many encoding schemes to fit them into the sequence of the byte data.
70
71 - http://en.wikipedia.org/wiki/Extended_Unix_Code[EUC] and http://en.wikipedia.org/wiki/ISO/IEC_2022[ISO/IEC 2022 (also known as JIS X 0202)] for Japanese
72 - http://en.wikipedia.org/wiki/UTF-8[UTF-8], http://en.wikipedia.org/wiki/UTF-16/UCS-2[UTF-16/UCS-2] and http://en.wikipedia.org/wiki/UTF-32/UCS-4[UTF-32/UCS-4] for Unicode
73
74 For these, there are clear differentiations between the character set and the character encoding.
75
76 The http://en.wikipedia.org/wiki/Code_page[code page] is used as the synonym to the character encoding tables for some vendor specific ones.
77
78 NOTE: Please note most encoding systems share the same code with ASCII for the 7 bit characters. But there are some exceptions. If you are converting old Japanese C programs and URLs data from the casually-called shift-JIS encoding format to UTF-8 format, use "`CP932`" as the encoding name instead of "`shift-JIS`" to get the expected results: `0x5C` -> "`\`" and `0x7E` -> "`\~`" . Otherwise, these are converted to wrong characters.
79
80 TIP: `recode`(1) may be used too and offers more than the combined functionality of `iconv`(1), `fromdos`(1), `todos`(1), `frommac`(1), and `tomac`(1). For more, see "`info recode`".
81
82
83 ==== Checking file to be UTF-8 with iconv
84
85 You can check if a text file is encoded in UTF-8 with `iconv`(1) by the following.
86
87 --------------------
88 $ iconv -f utf8 -t utf8 input.txt >/dev/null || echo "non-UTF-8 found"
89 --------------------
90
91 TIP: Use "`--verbose`" option in the above example to find the first non-UTF-8 character.
92
93 ==== Converting file names with iconv
94
95 Here is an example script to convert encoding of file names from ones created under older OS to modern UTF-8 ones in a single directory.
96
97 --------------------
98 #!/bin/sh
99 ENCDN=iso-8859-1
100 for x in *;
101 do
102 mv "$x" $(echo "$x" | iconv -f $ENCDN -t utf-8)
103 done
104 --------------------
105
106 The "`$ENCDN`" variable should be set by the encoding value in <<list-of-encoding-values>>.
107
108 For more complicated case, please mount a filesystem (e.g. a partition on a disk drive) containing such file names with proper encoding as the `mount`(8) option (see <<_filename_encoding>>) and copy its entire contents to another filesystem mounted as UTF-8 with "`cp -a`" command.
109
110 ==== EOL conversion
111
112 The text file format, specifically the end-of-line (EOL) code, is dependent on the platform.
113
114 .List of EOL styles for different platforms
115 [grid="all"]
116 `------------------------`---------`--------`-------`----------
117 platform EOL code control decimal hexadecimal
118 ---------------------------------------------------------
119 Debian (unix) LF `\^J` 10 0A
120 MSDOS and Windows CR-LF `\^M\^J` 13 10 0D 0A
121 Apple@@@sq@@@s Macintosh CR `\^M` 13 0D
122 ---------------------------------------------------------
123
124 The EOL format conversion programs, `fromdos`(1), `todos`(1), `frommac`(1), and `tomac`(1), are quite handy. `recode`(1) is also useful.
125
126 NOTE: Some data on the Debian system, such as the wiki page data for the `python-moinmoin` package, use MSDOS style CR-LF as the EOL code. So the above rule is just a general rule.
127
128 NOTE: Most editors (eg. `vim`, `emacs`, `gedit`, ...) can handle files in MSDOS style EOL transparently.
129
130 TIP: The use of "`sed -e \'/\r$/!s/$/\r/\'`" instead of `todos`(1) is better when you want to unify the EOL style to the MSDOS style from the mixed MSDOS and Unix style. (e.g., after merging 2 MSDOS style files with `diff3`(1).) This is because `todos` adds CR to all lines.
131
132 ==== TAB conversion
133
134 There are few popular specialized programs to convert the tab codes.
135
136 .List of TAB conversion commands from `bsdmainutils` and `coreutils` packages
137 [grid="all"]
138 `------------------------`--------------`-----------
139 function `bsdmainutils` `coreutils`
140 ----------------------------------------------------
141 expand tab to spaces "`col -x`" `expand`
142 unexpand tab from spaces "`col -h`" `unexpand`
143 ----------------------------------------------------
144
145 `indent`(1) from the `indent` package completely reformats whitespaces in the C program.
146
147 Editor programs such as `vim` and `emacs` can be used for TAB conversion, too. For example with `vim`, you can expand TAB with "`:set expandtab`" and "`:%retab`" command sequence. You can revert this with "`:set noexpandtab`" and "`:%retab!`" command sequence.
148
149 ==== Editors with auto-conversion
150
151 Intelligent modern editors such as the `vim` program are quite smart and copes well with any encoding systems and any file formats. You should use these editors under the UTF-8 locale in the UTF-8 capable console for the best compatibility.
152
153 An old western European Unix text file, "`u-file.txt`", stored in the latin1 (iso-8859-1) encoding can be edited simply with `vim` by the following.
154
155 --------------------
156 $ vim u-file.txt
157 --------------------
158 This is possible since the auto detection mechanism of the file encoding in `vim` assumes the UTF-8 encoding first and, if it fails, assumes it to be latin1.
159
160 An old Polish Unix text file, "`pu-file.txt`", stored in the latin2 (iso-8859-2) encoding can be edited with `vim` by the following.
161
162 --------------------
163 $ vim '+e ++enc=latin2 pu-file.txt'
164 --------------------
165
166 An old Japanese unix text file, "`ju-file.txt`", stored in the eucJP encoding can be edited with `vim` by the following.
167
168 --------------------
169 $ vim '+e ++enc=eucJP ju-file.txt'
170 --------------------
171
172 An old Japanese MS-Windows text file, "`jw-file.txt`", stored in the so called shift-JIS encoding (more precisely: CP932) can be edited with `vim` by the following.
173
174 --------------------
175 $ vim '+e ++enc=CP932 ++ff=dos jw-file.txt'
176 --------------------
177
178 When a file is opened with "`@@@plus@@@@@@plus@@@enc`" and "`@@@plus@@@@@@plus@@@ff`" options, "`:w`" in the Vim command line stores it in the original format and overwrite the original file. You can also specify the saving format and the file name in the Vim command line, e.g., "`:w @@@plus@@@@@@plus@@@enc=utf8 new.txt`".
179
180 Please refer to the mbyte.txt "multi-byte text support" in `vim` on-line help and <<list-of-encoding-values>> for locale values used with "`++enc`".
181
182 The `emacs` family of programs can perform the equivalent functions.
183
184 //# I do not know easy description for EMACS. Please update this for EMACS.
185
186 ==== Plain text extraction
187
188 The following reads a web page into a text file. This is very useful when copying configurations off the Web or applying basic Unix text tools such as `grep`(1) on the web page.
189
190
191 --------------------
192 $ w3m -dump http://www.remote-site.com/help-info.html >textfile
193 --------------------
194
195 Similarly, you can extract plain text data from other formats using the following.
196
197 .List of tools to extract plain text data
198 [grid="all"]
199 `-----------`-------------`------------`----------------`--------------------------------------------------------------------
200 package popcon size keyword function
201 -----------------------------------------------------------------------------------------------------------------------------
202 `w3m` @-@popcon1@-@ @-@psize1@-@ html->text HTML to text converter with the "`w3m -dump`" command
203 `html2text` @-@popcon1@-@ @-@psize1@-@ html->text advanced HTML to text converter (ISO 8859-1)
204 `lynx` @-@popcon1@-@ @-@psize1@-@ html->text HTML to text converter with the "`lynx -dump`" command
205 `elinks` @-@popcon1@-@ @-@psize1@-@ html->text HTML to text converter with the "`elinks -dump`" command
206 `links` @-@popcon1@-@ @-@psize1@-@ html->text HTML to text converter with the "`links -dump`" command
207 `links2` @-@popcon1@-@ @-@psize1@-@ html->text HTML to text converter with the "`links2 -dump`" command
208 `antiword` @-@popcon1@-@ @-@psize1@-@ MSWord->text,ps convert MSWord files to plain text or ps
209 `catdoc` @-@popcon1@-@ @-@psize1@-@ MSWord->text,TeX convert MSWord files to plain text or TeX
210 `pstotext` @-@popcon1@-@ @-@psize1@-@ ps/pdf->text extract text from PostScript and PDF files
211 `unhtml` @-@popcon1@-@ @-@psize1@-@ html->text remove the markup tags from an HTML file
212 `odt2txt` @-@popcon1@-@ @-@psize1@-@ odt->text converter from OpenDocument Text to text
213 -----------------------------------------------------------------------------------------------------------------------------
214
215 ==== Highlighting and formatting plain text data
216
217 You can highlight and format plain text data by the following.
218
219 .List of tools to highlight plain text data
220 [grid="all"]
221 `------------------`-------------`------------`-----------`-------------------------------------------------------------------------------
222 package popcon size keyword description
223 ------------------------------------------------------------------------------------------------------------------------------------------
224 `vim-runtime` @-@popcon1@-@ @-@psize1@-@ highlight Vim MACRO to convert source code to HTML with "`:source $VIMRUNTIME/syntax/html.vim`"
225 `cxref` @-@popcon1@-@ @-@psize1@-@ c->html converter for the C program to latex and HTML (C language)
226 `src2tex` @-@popcon1@-@ @-@psize1@-@ highlight convert many source codes to TeX (C language)
227 `source-highlight` @-@popcon1@-@ @-@psize1@-@ highlight convert many source codes to HTML, XHTML, LaTeX, Texinfo, ANSI color escape sequences and DocBook files with highlight (C++)
228 `highlight` @-@popcon1@-@ @-@psize1@-@ highlight convert many source codes to HTML, XHTML, RTF, LaTeX, TeX or XSL-FO files with highlight (C++)
229 `grc` @-@popcon1@-@ @-@psize1@-@ text->color generic colouriser for everything (Python)
230 `txt2html` @-@popcon1@-@ @-@psize1@-@ text->html text to HTML converter (Perl)
231 `markdown` @-@popcon1@-@ @-@psize1@-@ text->html markdown text document formatter to (X)HTML (Perl)
232 `asciidoc` @-@popcon1@-@ @-@psize1@-@ text->any AsciiDoc text document formatter to XML/HTML (Python)
233 `python-docutils` @-@popcon1@-@ @-@psize1@-@ text->any ReStructured Text document formatter to XML (Python)
234 `txt2tags` @-@popcon1@-@ @-@psize1@-@ text->any document conversion from text to HTML, SGML, LaTeX, man page, MoinMoin, Magic Point and PageMaker (Python)
235 `udo` @-@popcon1@-@ @-@psize1@-@ text->any universal document - text processing utility (C language)
236 `stx2any` @-@popcon1@-@ @-@psize1@-@ text->any document converter from structured plain text to other formats (m4)
237 `rest2web` @-@popcon1@-@ @-@psize1@-@ text->html document converter from ReStructured Text to html (Python)
238 `aft` @-@popcon1@-@ @-@psize1@-@ text->any "free form" document preparation system (Perl)
239 `yodl` @-@popcon1@-@ @-@psize1@-@ text->any pre-document language and tools to process it (C language)
240 `sdf` @-@popcon1@-@ @-@psize1@-@ text->any simple document parser (Perl)
241 `sisu` @-@popcon1@-@ @-@psize1@-@ text->any document structuring, publishing and search framework (Ruby)
242 ------------------------------------------------------------------------------------------------------------------------------------------
243
244 === XML data
245
246 http://en.wikipedia.org/wiki/XML[The Extensible Markup Language (XML)] is a markup language for documents containing structured information.
247
248 See introductory information at http://xml.com/[XML.COM].
249
250 - http://www.xml.com/pub/a/98/10/guide0.html["What is XML?"]
251 - http://xml.com/pub/a/2000/08/holman/index.html["What Is XSLT?"]
252 - http://xml.com/pub/a/2002/03/20/xsl-fo.html["What Is XSL-FO?"]
253 - http://xml.com/pub/a/2000/09/xlink/index.html["What Is XLink?"]
254
255 ==== Basic hints for XML
256
257 XML text looks somewhat like http://en.wikipedia.org/wiki/HTML[HTML]. It enables us to manage multiple formats of output for a document. One easy XML system is the `docbook-xsl` package, which is used here.
258
259 Each XML file starts with standard XML declaration as the following.
260
261 --------------------
262 <?xml version="1.0" encoding="UTF-8"?>
263 --------------------
264
265 The basic syntax for one XML element is marked up as the following.
266
267 --------------------
268 <name attribute="value">content</name>
269 --------------------
270
271 XML element with empty content is marked up in the following short form.
272
273 --------------------
274 <name attribute="value"/>
275 --------------------
276
277 The "`attribute="value"`" in the above examples are optional.
278
279 The comment section in XML is marked up as the following.
280
281 --------------------
282 <!-- comment -->
283 --------------------
284
285 Other than adding markups, XML requires minor conversion to the content using predefined entities for following characters.
286
287 .List of predefined entities for XML
288 [grid="all"]
289 `-----------------`------------------------------
290 predefined entity character to be converted from
291 -------------------------------------------------
292 `&quot;` `"` : quote
293 `&apos;` `\'` : apostrophe
294 `&lt;` `<` : less-than
295 `&gt;` `>` : greater-than
296 `&amp;` `&` : ampersand
297 -------------------------------------------------
298
299 CAUTION: "`<`" or "`&`" can not be used in attributes or elements.
300
301 NOTE: When SGML style user defined entities, e.g. "`&some-tag:`", are used, the first definition wins over others. The entity definition is expressed in "`<!ENTITY some-tag "entity value">`".
302
303 NOTE: As long as the XML markup are done consistently with certain set of the tag name (either some data as content or attribute value), conversion to another XML is trivial task using http://en.wikipedia.org/wiki/XSL_Transformations[Extensible Stylesheet Language Transformations (XSLT)].
304
305 ==== XML processing
306
307 There are many tools available to process XML files such as http://en.wikipedia.org/wiki/Extensible_Stylesheet_Language[the Extensible Stylesheet Language (XSL)].
308
309 Basically, once you create well formed XML file, you can convert it to any format using http://en.wikipedia.org/wiki/XSL_Transformations[Extensible Stylesheet Language Transformations (XSLT)].
310
311 The http://en.wikipedia.org/wiki/XSL_Formatting_Objects[Extensible Stylesheet Language for Formatting Object (XSL-FO)] is supposed to be solution for formatting. The `fop` package is in the Debian `contrib` (not `main`) archive still. So the LaTeX code is usually generated from XML using XSLT and the LaTeX system is used to create printable file such as DVI, PostScript, and PDF.
312
313
314 .List of XML tools
315 [grid="all"]
316 `-------------`-------------`------------`----------`-------------------------------------------------------------------------------------
317 package popcon size keyword description
318 ------------------------------------------------------------------------------------------------------------------------------------------
319 `docbook-xml` @-@popcon1@-@ @-@psize1@-@ xml XML document type definition (DTD) for DocBook
320 `xsltproc` @-@popcon1@-@ @-@psize1@-@ xslt XSLT command line processor (XML-> XML, HTML, plain text, etc.)
321 `docbook-xsl` @-@popcon1@-@ @-@psize1@-@ xml/xslt XSL stylesheets for processing DocBook XML to various output formats with XSLT
322 `xmlto` @-@popcon1@-@ @-@psize1@-@ xml/xslt XML-to-any converter with XSLT
323 `dblatex` @-@popcon1@-@ @-@psize1@-@ xml/xslt convert Docbook files to DVI, PostScript, PDF documents with XSLT
324 `fop` @-@popcon1@-@ @-@psize1@-@ xml/xsl-fo convert Docbook XML files to PDF
325 ------------------------------------------------------------------------------------------------------------------------------------------
326
327 Since XML is subset of http://en.wikipedia.org/wiki/SGML[Standard Generalized Markup Language (SGML)], it can be processed by the extensive tools available for SGML, such as http://en.wikipedia.org/wiki/Document_Style_Semantics_and_Specification_Language[Document Style Semantics and Specification Language (DSSSL)].
328
329 .List of DSSL tools
330 [grid="all"]
331 `---------------`-------------`------------`----------`-----------------------------------------------------------------------------------
332 package popcon size keyword description
333 ------------------------------------------------------------------------------------------------------------------------------------------
334 `openjade` @-@popcon1@-@ @-@psize1@-@ dsssl ISO/IEC 10179:1996 standard DSSSL processor (latest)
335 `openjade1.3` @-@popcon1@-@ @-@psize1@-@ dsssl ISO/IEC 10179:1996 standard DSSSL processor (1.3.x series)
336 `jade` @-@popcon1@-@ @-@psize1@-@ dsssl James Clark@@@sq@@@s original DSSSL processor (1.2.x series)
337 `docbook-dsssl` @-@popcon1@-@ @-@psize1@-@ xml/dsssl DSSSL stylesheets for processing DocBook XML to various output formats with DSSSL
338 `docbook-utils` @-@popcon1@-@ @-@psize1@-@ xml/dsssl utilities for DocBook files including conversion to other formats (HTML, RTF, PS, man, PDF) with `docbook2\*` commands with DSSSL
339 `sgml2x` @-@popcon1@-@ @-@psize1@-@ SGML/dsssl converter from SGML and XML using DSSSL stylesheets
340 ------------------------------------------------------------------------------------------------------------------------------------------
341
342 TIP: http://en.wikipedia.org/wiki/GNOME[GNOME]\'s `yelp` is sometimes handy to read http://en.wikipedia.org/wiki/DocBook[DocBook] XML files directly since it renders decently on X.
343
344 ==== The XML data extraction
345
346 You can extract HTML or XML data from other formats using followings.
347
348 .List of XML data extraction tools
349 [grid="all"]
350 `-----------`-------------`------------`------------------`------------------------------------------------------------------
351 package popcon size keyword description
352 -----------------------------------------------------------------------------------------------------------------------------
353 `wv` @-@popcon1@-@ @-@psize1@-@ MSWord->any document converter from Microsoft Word to HTML, LaTeX, etc.
354 `texi2html` @-@popcon1@-@ @-@psize1@-@ texi->html converter from Texinfo to HTML
355 `man2html` @-@popcon1@-@ @-@psize1@-@ manpage->html converter from manpage to HTML (CGI support)
356 `tex4ht` @-@popcon1@-@ @-@psize1@-@ tex<->html converter between (La)TeX and HTML
357 `xlhtml` @-@popcon1@-@ @-@psize1@-@ MSExcel->html converter from MSExcel .xls to HTML
358 `ppthtml` @-@popcon1@-@ @-@psize1@-@ MSPowerPoint->html converter from MSPowerPoint to HTML
359 `unrtf` @-@popcon1@-@ @-@psize1@-@ rtf->html document converter from RTF to HTML, etc
360 `info2www` @-@popcon1@-@ @-@psize1@-@ info->html converter from GNU info to HTML (CGI support)
361 `ooo2dbk` @-@popcon1@-@ @-@psize1@-@ sxw->xml converter from OpenOffice.org SXW documents to DocBook XML
362 `wp2x` @-@popcon1@-@ @-@psize1@-@ WordPerfect->any WordPerfect 5.0 and 5.1 files to TeX, LaTeX, troff, GML and HTML
363 `doclifter` @-@popcon1@-@ @-@psize1@-@ troff->xml converter from troff to DocBook XML
364 -----------------------------------------------------------------------------------------------------------------------------
365
366 For non-XML HTML files, you can convert them to XHTML which is an instance of well formed XML. XHTML can be processed by XML tools.
367
368 .List of XML pretty print tools
369 [grid="all"]
370 `---------------`-------------`------------`------------------`---------------------------------------------------------------------------
371 package popcon size keyword description
372 ------------------------------------------------------------------------------------------------------------------------------------------
373 `libxml2-utils` @-@popcon1@-@ @-@psize1@-@ xml<->html<->xhtml command line XML tool with `xmllint`(1) (syntax check, reformat, lint, ...)
374 `tidy` @-@popcon1@-@ @-@psize1@-@ xml<->html<->xhtml HTML syntax checker and reformatter
375 -----------------------------------------------------------------------------------------------------------------------------------------------
376
377 Once proper XML is generated, you can use XSLT technology to extract data based on the mark-up context etc.
378
379 === Printable data
380
381 Printable data is expressed in the http://en.wikipedia.org/wiki/PostScript[PostScript] format on the Debian system. http://en.wikipedia.org/wiki/Common_Unix_Printing_System[Common Unix Printing System (CUPS)] uses Ghostscript as its rasterizer backend program for non-PostScript printers.
382
383 ==== Ghostscript
384
385 The core of printable data manipulation is the http://en.wikipedia.org/wiki/Ghostscript[Ghostscript] http://en.wikipedia.org/wiki/PostScript[PostScript (PS)] interpreter which generates raster image.
386
387 The latest upstream Ghostscript from Artifex was re-licensed from AFPL to GPL and merged all the latest ESP version changes such as CUPS related ones at 8.60 release as unified release.
388
389
390 .List of Ghostscript PostScript interpreters
391 [grid="all"]
392 `-------------------`-------------`------------`------------------------------------------------------------------------------------------
393 package popcon size description
394 ------------------------------------------------------------------------------------------------------------------------------------------
395 `ghostscript` @-@popcon1@-@ @-@psize1@-@ http://en.wikipedia.org/wiki/Ghostscript[The GPL Ghostscript PostScript/PDF interpreter]
396 `ghostscript-x` @-@popcon1@-@ @-@psize1@-@ GPL Ghostscript PostScript/PDF interpreter - X display support
397 `gs-cjk-resource` @-@popcon1@-@ @-@psize1@-@ resource files for gs-cjk, Ghostscript http://en.wikipedia.org/wiki/CJK_characters[CJK]-TrueType extension
398 `cmap-adobe-cns1` @-@popcon1@-@ @-@psize1@-@ CMaps for Adobe-CNS1 (for traditional Chinese support)
399 `cmap-adobe-gb1` @-@popcon1@-@ @-@psize1@-@ CMaps for Adobe-GB1 (for simplified Chinese support)
400 `cmap-adobe-japan1` @-@popcon1@-@ @-@psize1@-@ CMaps for Adobe-Japan1 (for Japanese standard support)
401 `cmap-adobe-japan2` @-@popcon1@-@ @-@psize1@-@ CMaps for Adobe-Japan2 (for Japanese extra support)
402 `cmap-adobe-korea1` @-@popcon1@-@ @-@psize1@-@ CMaps for Adobe-Korea1 (for Korean support)
403 `libpoppler13` @-@popcon1@-@ @-@psize1@-@ PDF rendering library based on xpdf PDF viewer
404 `libpoppler-glib6` @-@popcon1@-@ @-@psize1@-@ PDF rendering library (GLib-based shared library)
405 `poppler-data` @-@popcon1@-@ @-@psize1@-@ CMaps for PDF rendering library (for http://en.wikipedia.org/wiki/CJK_characters[CJK] support: Adobe-\*)
406 ----------------------------------------------------------------------------------------------------------------------------------------------------------
407
408 TIP: "`gs -h`" can display the configuration of Ghostscript.
409
410 ==== Merge two PS or PDF files
411
412 You can merge two http://en.wikipedia.org/wiki/PostScript[PostScript (PS)] or http://en.wikipedia.org/wiki/Portable_Document_Format[Portable Document Format (PDF)] files using `gs`(1) of Ghostscript.
413
414 --------------------
415 $ gs -q -dNOPAUSE -dBATCH -sDEVICE=pswrite -sOutputFile=bla.ps -f foo1.ps foo2.ps
416 $ gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=bla.pdf -f foo1.pdf foo2.pdf
417 --------------------
418
419 NOTE: The http://en.wikipedia.org/wiki/Portable_Document_Format[PDF], which is widely used cross-platform printable data format, is essentially the compressed http://en.wikipedia.org/wiki/PostScript[PS] format with few additional features and extensions.
420
421 TIP: For command line, `psmerge`(1) and other commands from the `psutils` package are useful for manipulating PostScript documents. Commands in the `pdfjam` package work similarly for manipulating PDF documents. `pdftk`(1) from the `pdftk` package is useful for manipulating PDF documents, too.
422
423 ==== Printable data utilities
424
425 The following packages for the printable data utilities caught my eyes.
426
427 .List of printable data utilities
428 [grid="all"]
429 `---------------`-------------`------------`-------------------`--------------------------------------------------------------------------
430 package popcon size keyword description
431 ------------------------------------------------------------------------------------------------------------------------------------------
432 `poppler-utils` @-@popcon1@-@ @-@psize1@-@ pdf->ps,text,... PDF utilities: `pdftops`, `pdfinfo`, `pdfimages`, `pdftotext`, `pdffonts`
433 `psutils` @-@popcon1@-@ @-@psize1@-@ ps->ps PostScript document conversion tools
434 `poster` @-@popcon1@-@ @-@psize1@-@ ps->ps create large posters out of PostScript pages
435 `enscript` @-@popcon1@-@ @-@psize1@-@ text->ps, html, rtf convert ASCII text to PostScript, HTML, RTF or Pretty-Print
436 `a2ps` @-@popcon1@-@ @-@psize1@-@ text->ps \'Anything to PostScript\' converter and pretty-printer
437 `pdftk` @-@popcon1@-@ @-@psize1@-@ pdf->pdf PDF document conversion tool: `pdftk`
438 `mpage` @-@popcon1@-@ @-@psize1@-@ text,ps->ps print multiple pages per sheet
439 `html2ps` @-@popcon1@-@ @-@psize1@-@ html->ps converter from HTML to PostScript
440 `pdfjam` @-@popcon1@-@ @-@psize1@-@ pdf->pdf PDF document conversion tools: `pdf90`, `pdfjoin`, and `pdfnup`
441 `gnuhtml2latex` @-@popcon1@-@ @-@psize1@-@ html->latex converter from html to latex
442 `latex2rtf` @-@popcon1@-@ @-@psize1@-@ latex->rtf convert documents from LaTeX to RTF which can be read by MS Word
443 `ps2eps` @-@popcon1@-@ @-@psize1@-@ ps->eps converter from PostScript to EPS (Encapsulated PostScript)
444 `e2ps` @-@popcon1@-@ @-@psize1@-@ text->ps Text to PostScript converter with Japanese encoding support
445 `impose+` @-@popcon1@-@ @-@psize1@-@ ps->ps PostScript utilities
446 `trueprint` @-@popcon1@-@ @-@psize1@-@ text->ps pretty print many source codes (C, C++, Java, Pascal, Perl, Pike, Sh, and Verilog) to PostScript. (C language)
447 `pdf2svg` @-@popcon1@-@ @-@psize1@-@ ps->svg converter from PDF to http://en.wikipedia.org/wiki/Scalable_Vector_Graphics[Scalable vector graphics] format
448 `pdftoipe` @-@popcon1@-@ @-@psize1@-@ ps->ipe converter from PDF to IPE@@@sq@@@s XML format
449 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
450
451 // removed from archive
452 // || {{{rtf2latex}}} || 44 || - || rtf->latex || This converts documents from RTF which can be created by MS Word to LaTeX. ||
453
454 ==== Printing with CUPS
455
456 Both `lp`(1) and `lpr`(1) commands offered by http://en.wikipedia.org/wiki/Common_Unix_Printing_System[Common Unix Printing System (CUPS)] provides options for customized printing the printable data.
457
458 You can print 3 copies of a file collated using one of the following commands.
459
460 --------------------
461 $ lp -n 3 -o Collate=True filename
462 --------------------
463
464 --------------------
465 $ lpr -#3 -o Collate=True filename
466 --------------------
467
468 You can further customize printer operation by using printer option such as "`-o number-up=2`", "`-o page-set=even`", "`-o page-set=odd`", "`-o scaling=200`", "`-o natural-scaling=200`", etc., documented at http://localhost:631/help/options.html[Command-Line Printing and Options].
469
470 === Type setting
471
472 The Unix http://en.wikipedia.org/wiki/Troff[troff] program originally developed by AT&T can be used for simple typesetting. It is usually used to create manpages.
473
474 http://en.wikipedia.org/wiki/TeX[TeX] created by Donald Knuth is very powerful type setting tool and is the de facto standard. http://en.wikipedia.org/wiki/LaTeX[LaTeX] originally written by Leslie Lamport enables a high-level access to the power of TeX.
475
476 .List of type setting tools
477 [grid="all"]
478 `--------------`-------------`------------`-------`----------------------------------------------------
479 package popcon size keyword description
480 -------------------------------------------------------------------------------------------------------
481 `texlive` @-@popcon1@-@ @-@psize1@-@ (La)TeX TeX system for typesetting, previewing and printing
482 `groff` @-@popcon1@-@ @-@psize1@-@ troff GNU troff text-formatting system
483 -------------------------------------------------------------------------------------------------------
484
485 ==== roff typesetting
486
487 Traditionally, http://en.wikipedia.org/wiki/Roff[roff] is the main Unix text processing system. See `roff`(7), `groff`(7), `groff`(1), `grotty`(1), `troff`(1), `groff_mdoc`(7), `groff_man`(7), `groff_ms`(7), `groff_me`(7), `groff_mm`(7), and "`info groff`".
488
489 You can read or print a good tutorial and reference on "`-me`" http://en.wikipedia.org/wiki/Macro_(computer_science)[macro] in "`/usr/share/doc/groff/`" by installing the `groff` package.
490
491 TIP: "`groff -Tascii -me -`" produces plain text output with http://en.wikipedia.org/wiki/ANSI_escape_code[ANSI escape code]. If you wish to get manpage like output with many "^H" and "_", use "`GROFF_NO_SGR=1 groff -Tascii -me -`" instead.
492
493 TIP: To remove "^H" and "_" from a text file generated by `groff`, filter it by "`col -b -x`".
494
495 ==== TeX/LaTeX
496
497 The http://en.wikipedia.org/wiki/TeX_Live[TeX Live] software distribution offers a complete TeX system. The `texlive` metapackage provides a decent selection of the http://en.wikipedia.org/wiki/TeX_Live[TeX Live] packages which should suffice for the most common tasks.
498
499 There are many references available for http://en.wikipedia.org/wiki/TeX[TeX] and http://en.wikipedia.org/wiki/LaTeX[LaTeX].
500
501 - http://www.tldp.org/HOWTO/TeTeX-HOWTO.html[The teTeX HOWTO: The Linux-teTeX Local Guide]
502 - `tex`(1)
503 - `latex`(1)
504 - "The TeXbook", by Donald E. Knuth, (Addison-Wesley)
505 - "LaTeX - A Document Preparation System", by Leslie Lamport, (Addison-Wesley)
506 - "The LaTeX Companion", by Goossens, Mittelbach, Samarin, (Addison-Wesley)
507
508 This is the most powerful typesetting environment. Many http://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language[SGML] processors use this as their back end text processor. http://en.wikipedia.org/wiki/Lyx[Lyx] provided by the `lyx` package and http://en.wikipedia.org/wiki/GNU_TeXmacs[GNU TeXmacs] provided by the `texmacs` package offer nice http://en.wikipedia.org/wiki/WYSIWYG[WYSIWYG] editing environment for http://en.wikipedia.org/wiki/LaTeX[LaTeX] while many use http://en.wikipedia.org/wiki/Emacs[Emacs] and http://en.wikipedia.org/wiki/Vim_(text_editor)[Vim] as the choice for the source editor.
509
510 There are many online resources available.
511
512 - The TEX Live Guide - TEX Live 2007 ("`/usr/share/doc/texlive-doc-base/english/texlive-en/live.html`") (`texlive-doc-base` package)
513 - http://www.stat.rice.edu/\~helpdesk/howto/lyxguide.html[A Simple Guide to Latex/Lyx]
514 - http://www-h.eng.cam.ac.uk/help/tpl/textprocessing/latex_basic/latex_basic.html[Word Processing Using LaTeX]
515 - http://supportweb.cs.bham.ac.uk/documentation/LaTeX/lguide/local-guide/local-guide.html[Local User Guide to teTeX/LaTeX]
516
517 // * A Quick Introduction to LaTeX: [http://www.msu.edu/user/pfaffben/writings/]
518
519 // The following needs to be checked.
520
521 When documents become bigger, sometimes TeX may cause errors. You must increase pool size in "`/etc/texmf/texmf.cnf`" (or more appropriately edit "`/etc/texmf/texmf.d/95NonPath`" and run `update-texmf`(8)) to fix this.
522
523 NOTE: The TeX source of "The TeXbook" is available at http://tug.ctan.org/tex-archive/systems/knuth/dist/tex/texbook.tex[http://tug.ctan.org/tex-archive/systems/knuth/dist/tex/texbook.tex].
524
525 This file contains most of the required macros. I heard that you can process this document with `tex`(1) after commenting lines 7 to 10 and adding "`\input manmac \proofmodefalse`". It@@@sq@@@s strongly recommended to buy this book (and all other books from Donald E. Knuth) instead of using the online version but the source is a great example of TeX input!
526
527 ==== Pretty print a manual page
528
529 You can print a manual page in PostScript nicely by one of the following commands.
530
531 --------------------
532 $ man -Tps some_manpage | lpr
533 --------------------
534
535 --------------------
536 $ man -Tps some_manpage | mpage -2 | lpr
537 --------------------
538
539 The second example prints 2 pages on one sheet.
540
541 ==== Creating a manual page
542
543 Although writing a manual page (manpage) in the plain http://en.wikipedia.org/wiki/Troff[troff] format is possible, there are few helper packages to create it.
544
545
546 .List of packages to help creating the manpage
547 [grid="all"]
548 `----------------`-------------`------------`-------------`-----------------------------------------------------
549 package popcon size keyword description
550 ----------------------------------------------------------------------------------------------------------------
551 `docbook-to-man` @-@popcon1@-@ @-@psize1@-@ SGML->manpage converter from DocBook SGML into roff man macros
552 `help2man` @-@popcon1@-@ @-@psize1@-@ text->manpage automatic manpage generator from --help
553 `info2man` @-@popcon1@-@ @-@psize1@-@ info->manpage converter from GNU info to POD or man pages
554 `txt2man` @-@popcon1@-@ @-@psize1@-@ text->manpage convert flat ASCII text to man page format
555 ----------------------------------------------------------------------------------------------------------------
556
557 === The mail data conversion
558
559 The following packages for the mail data conversion caught my eyes.
560
561 .List of packages to help mail data conversion
562 [grid="all"]
563 `------------`-------------`------------`------------`------------------------------------------------------------------------------------
564 package popcon size keyword description
565 ------------------------------------------------------------------------------------------------------------------------------------------
566 `sharutils` @-@popcon1@-@ @-@psize1@-@ mail `shar`(1), `unshar`(1), `uuencode`(1), `uudecode`(1)
567 `mpack` @-@popcon1@-@ @-@psize1@-@ MIME encoder and decoder http://en.wikipedia.org/wiki/MIME[MIME] messages: `mpack`(1) and `munpack`(1)
568 `tnef` @-@popcon1@-@ @-@psize1@-@ ms-tnef unpacking http://en.wikipedia.org/wiki/MIME[MIME] attachments of type "application/ms-tnef" which is a Microsoft only format
569 `uudeview` @-@popcon1@-@ @-@psize1@-@ mail encoder and decoder for the following formats: http://en.wikipedia.org/wiki/Uuencoding[uuencode], http://en.wikipedia.org/wiki/Xxencode[xxencode], http://en.wikipedia.org/wiki/Base64[BASE64], http://en.wikipedia.org/wiki/Quoted-printable[quoted printable], and http://en.wikipedia.org/wiki/BinHex[BinHex]
570 `readpst` @-@popcon1@-@ @-@psize1@-@ PST convert Microsoft http://en.wikipedia.org/wiki/Personal_Folders_(.pst)_file[Outlook PST files] to http://en.wikipedia.org/wiki/Mbox[mbox] format
571 ------------------------------------------------------------------------------------------------------------------------------------------
572
573 TIP: The http://en.wikipedia.org/wiki/Internet_Message_Access_Protocol[Internet Message Access Protocol] version 4 (IMAP4) server (see <<_pop3_imap4_server>>) may be used to move mails out from proprietary mail systems if the mail client software can be configured to use IMAP4 server too.
574
575 ==== Mail data basics
576
577 Mail (http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol[SMTP]) data should be limited to 7 bit. So binary data and 8 bit text data are encoded into 7 bit format with the http://en.wikipedia.org/wiki/MIME[Multipurpose Internet Mail Extensions (MIME)] and the selection of the charset (see <<_basics_of_encoding>>).
578
579 The standard mail storage format is mbox formatted according to http://tools.ietf.org/html/rfc2822[RFC2822 (updated RFC822)]. See `mbox`(5) (provided by the `mutt` package).
580
581 For European languages, "`Content-Transfer-Encoding: quoted-printable`" with the ISO-8859-1 charset is usually used for mail since there are not much 8 bit characters. If European text is encoded in UTF-8, "`Content-Transfer-Encoding: quoted-printable`" is likely to be used since it is mostly 7 bit data.
582
583 For Japanese, traditionally "`Content-Type: text/plain; charset=ISO-2022-JP`" is usually used for mail to keep text in 7 bits. But older Microsoft systems may send mail data in Shift-JIS without proper declaration. If Japanese text is encoded in UTF-8, http://en.wikipedia.org/wiki/Base64[Base64] is likely to be used since it contains many 8 bit data. The situation of other Asian languages is similar.
584
585 NOTE: If your non-Unix mail data is accessible by a non-Debian client software which can talk to the IMAP4 server, you may be able to move them out by running your own IMAP4 server (see <<_pop3_imap4_server>>).
586
587 NOTE: If you use other mail storage formats, moving them to mbox format is the good first step. The versatile client program such as `mutt`(1) may be handy for this.
588
589 You can split mailbox contents to each message using `procmail`(1) and `formail`(1).
590
591 Each mail message can be unpacked using `munpack`(1) from the `mpack` package (or other specialized tools) to obtain the MIME encoded contents.
592
593 === Graphic data tools
594
595 The following packages for the graphic data conversion, editing, and organization tools caught my eyes.
596
597 .List of graphic data tools
598 [grid="all"]
599 `--------------------------`-------------`------------`----------------------`------------------------------------------------------------
600 package popcon size keyword description
601 ------------------------------------------------------------------------------------------------------------------------------------------
602 `gimp` @-@popcon1@-@ @-@psize1@-@ image(bitmap) GNU Image Manipulation Program
603 `imagemagick` @-@popcon1@-@ @-@psize1@-@ image(bitmap) image manipulation programs
604 `graphicsmagick` @-@popcon1@-@ @-@psize1@-@ image(bitmap) image manipulation programs (folk of `imagemagick`)
605 `xsane` @-@popcon1@-@ @-@psize1@-@ image(bitmap) GTK+-based X11 frontend for SANE (Scanner Access Now Easy)
606 `netpbm` @-@popcon1@-@ @-@psize1@-@ image(bitmap) graphics conversion tools
607 `icoutils` @-@popcon1@-@ @-@psize1@-@ png<->ico(bitmap) convert http://en.wikipedia.org/wiki/ICO_(icon_image_file_format)[MS Windows icons and cursors to and from PNG formats] (http://en.wikipedia.org/wiki/Favicon[favicon.ico])
608 `scribus` @-@popcon1@-@ @-@psize1@-@ ps/pdf/SVG/... http://en.wikipedia.org/wiki/Scribus[Scribus] DTP editor
609 `openoffice.org-draw` @-@popcon1@-@ @-@psize1@-@ image(vector) OpenOffice.org office suite - drawing
610 `inkscape` @-@popcon1@-@ @-@psize1@-@ image(vector) http://en.wikipedia.org/wiki/Scalable_Vector_Graphics[SVG (Scalable Vector Graphics)] editor
611 `dia-gnome` @-@popcon1@-@ @-@psize1@-@ image(vector) diagram editor (GNOME)
612 `dia` @-@popcon1@-@ @-@psize1@-@ image(vector) diagram editor (Gtk)
613 `xfig` @-@popcon1@-@ @-@psize1@-@ image(vector) facility for Interactive Generation of figures under X11
614 `pstoedit` @-@popcon1@-@ @-@psize1@-@ ps/pdf->image(vector) PostScript and PDF files to editable vector graphics converter (SVG)
615 `libwmf-bin` @-@popcon1@-@ @-@psize1@-@ Windows/image(vector) Windows metafile (vector graphic data) conversion tools
616 `fig2sxd` @-@popcon1@-@ @-@psize1@-@ fig->sxd(vector) convert XFig files to OpenOffice.org Draw format
617 `unpaper` @-@popcon1@-@ @-@psize1@-@ image->image post-processing tool for scanned pages for http://en.wikipedia.org/wiki/Optical_character_recognition[OCR]
618 `tesseract-ocr` @-@popcon1@-@ @-@psize1@-@ image->text free http://en.wikipedia.org/wiki/Optical_character_recognition[OCR] software based on the HP@@@sq@@@s commercial OCR engine
619 `tesseract-ocr-eng` @-@popcon1@-@ @-@psize1@-@ image->text OCR engine data: tesseract-ocr language files for English text
620 `gocr` @-@popcon1@-@ @-@psize1@-@ image->text free OCR software
621 `ocrad` @-@popcon1@-@ @-@psize1@-@ image->text free OCR software
622 `gtkam` @-@popcon1@-@ @-@psize1@-@ image(Exif) manipulate digital camera photo files (GNOME) - GUI
623 `gphoto2` @-@popcon1@-@ @-@psize1@-@ image(Exif) manipulate digital camera photo files (GNOME) - command line
624 `kamera` @-@popcon1@-@ @-@psize1@-@ image(Exif) manipulate digital camera photo files (KDE)
625 `jhead` @-@popcon1@-@ @-@psize1@-@ image(Exif) manipulate the non-image part of Exif compliant JPEG (digital camera photo) files
626 `exif` @-@popcon1@-@ @-@psize1@-@ image(Exif) command-line utility to show EXIF information in JPEG files
627 `exiftags` @-@popcon1@-@ @-@psize1@-@ image(Exif) utility to read Exif tags from a digital camera JPEG file
628 `exiftran` @-@popcon1@-@ @-@psize1@-@ image(Exif) transform digital camera jpeg images
629 `exifprobe` @-@popcon1@-@ @-@psize1@-@ image(Exif) read metadata from digital pictures
630 `dcraw` @-@popcon1@-@ @-@psize1@-@ image(Raw)->ppm decode raw digital camera images
631 `findimagedupes` @-@popcon1@-@ @-@psize1@-@ image->fingerprint find visually similar or duplicate images
632 `ale` @-@popcon1@-@ @-@psize1@-@ image->image merge images to increase fidelity or create mosaics
633 `imageindex` @-@popcon1@-@ @-@psize1@-@ image(Exif)->html generate static HTML galleries from images
634 `f-spot` @-@popcon1@-@ @-@psize1@-@ image(Exif) personal photo management application (GNOME)
635 `bins` @-@popcon1@-@ @-@psize1@-@ image(Exif)->html generate static HTML photo albums using XML and EXIF tags
636 `gallery2` @-@popcon1@-@ @-@psize1@-@ image(Exif)->html generate browsable HTML photo albums with thumbnails
637 `outguess` @-@popcon1@-@ @-@psize1@-@ jpeg,png universal http://en.wikipedia.org/wiki/Steganography[Steganographic] tool
638 `qcad` @-@popcon1@-@ @-@psize1@-@ DXF CAD data editor (KDE)
639 `blender` @-@popcon1@-@ @-@psize1@-@ blend, TIFF, VRML, ... 3D content editor for animation etc
640 `mm3d` @-@popcon1@-@ @-@psize1@-@ ms3d, obj, dxf, ... OpenGL based 3D model editor
641 `open-font-design-toolkit` @-@popcon1@-@ @-@psize1@-@ ttf, ps, ... metapackage for open font design
642 `fontforge` @-@popcon1@-@ @-@psize1@-@ ttf, ps, ... font editor for PS, TrueType and OpenType fonts
643 `xgridfit` @-@popcon1@-@ @-@psize1@-@ ttf program for **gridfitting** and **hinting** TrueType fonts
644 ------------------------------------------------------------------------------------------------------------------------------------------
645
646 // || {{{gocr-gtk}}} || 41 || - || image->text || Free OCR software. GTK-GUI. ||
647 // || {{{stegdetect}}} || - || - || jpeg || Detects and extracts [http://en.wikipedia.org/wiki/Steganography steganography] messages inside JPEG ||
648
649 TIP: Search more image tools using regex "`\~Gworks-with::image`" in `aptitude`(8) (see <<_search_method_options_with_aptitude>>).
650
651 Although GUI programs such as `gimp`(1) are very powerful, command line tools such as `imagemagick`(1) are quite useful for automating image manipulation with the script.
652
653 The de facto image file format of the digital camera is the http://en.wikipedia.org/wiki/Exchangeable_image_file_format[Exchangeable Image File Format] (EXIF) which is the http://en.wikipedia.org/wiki/JPEG[JPEG] image file format with additional metadata tags. It can hold information such as date, time, and camera settings.
654
655 http://en.wikipedia.org/wiki/Lempel-Ziv-Welch[The Lempel-Ziv-Welch (LZW) lossless data compression] patent has been expired. http://en.wikipedia.org/wiki/Graphics_Interchange_Format[Graphics Interchange Format (GIF)] utilities which use the LZW compression method are now freely available on the Debian system.
656
657 TIP: Any digital camera or scanner with removable recording media works with Linux through http://en.wikipedia.org/wiki/USB_flash_drive[USB storage] readers since it follows the http://en.wikipedia.org/wiki/Design_rule_for_Camera_File_system[Design rule for Camera Filesystem] and uses http://en.wikipedia.org/wiki/File_Allocation_Table[FAT] filesystem. See <<_removable_storage_device>>.
658
659 === Miscellaneous data conversion
660
661 There are many other programs for converting data. Following packages caught my eyes using regex "`\~Guse::converting`" in `aptitude`(8) (see <<_search_method_options_with_aptitude>>).
662
663 .List of miscellaneous data conversion tools
664 [grid="all"]
665 `-----------`-------------`------------`------------`-------------------------------------------------------------------------------------
666 package popcon size keyword description
667 ------------------------------------------------------------------------------------------------------------------------------------------
668 `alien` @-@popcon1@-@ @-@psize1@-@ rpm/tgz->deb converter for the foreign package into the Debian package
669 `freepwing` @-@popcon1@-@ @-@psize1@-@ EB->EPWING converter from "Electric Book" (popular in Japan) to a single http://ja.wikipedia.org/wiki/JIS_X_4081[JIS X 4081] format (a subset of the http://ja.wikipedia.org/wiki/EPWING[EPWING] V1)
670 ------------------------------------------------------------------------------------------------------------------------------------------
671
672 You can also extract data from RPM format with the following.
673
674 --------------------
675 $ rpm2cpio file.src.rpm | cpio --extract
676 --------------------
677

  ViewVC Help
Powered by ViewVC 1.1.5