/[ddp]/utils/debiandoc2dbxml/dd2xml.html
ViewVC logotype

Contents of /utils/debiandoc2dbxml/dd2xml.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 567 - (hide annotations) (download) (as text)
Tue Dec 17 22:10:31 2002 UTC (10 years, 5 months ago) by osamu
File MIME type: text/html
File size: 12544 byte(s)
Updated documentation to match new script
1 osamu 550 <html>
2     <head>
3     <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
4     <meta name="Author" content="Osamu Aoki">
5     <meta name="GENERATOR" content="VIM">
6     <title>debiandoc-sgml to dookbook-xml conversion</title>
7     </head>
8     <body>
9     <h1>debiandoc-sgml to dookbook-xml conversion</h1>
10     <pre>
11     ==========================================================================
12     (original concept) Philippe Batailler &lt;pbatailler@teaser.fr&gt;
13     (original concept) Adam DiCarlo &lt;aph@debian.org&gt;
14 osamu 560 (ghost writer) Osamu Aoki &lt;osamu@debian.org&gt;
15 osamu 550 Sat Dec 14 00:54:21 2002
16     ==========================================================================
17     </pre>
18     </p>
19    
20     <h2><a name="toc">Table of contents</a></h2>
21     <p>
22     <ul>
23     <li><a href="#why">Why convert?</a></li>
24     <li><a href="#how">How to read this?</a></li>
25     <li><a href="#step">Step by step guide.</a></li>
26     <li><a href="#tags">How tags are converted?</a></li>
27     </ul>
28     </p>
29     <hr>
30    
31     <h2><a name="why">Why convert?</a></h2>
32     <p>
33     Because it is cool to be XML :)
34     </p>
35     <hr>
36    
37     <h2><a name="how">How to read this?</a></h2>
38     <p>
39     Use table capable web browser if you are reading HTML.
40     </p>
41     <ul>
42     <li>Mozilla</li>
43     <li>Galeon</li>
44     <li>links</li>
45     <li>w3m</li>
46     <li>...</li>
47     </ul>
48     <hr>
49    
50     <h2><a name="step">Step by step guide.</a></h2>
51     <p>
52     This is a rehashed tutorial given by Philippe Batailler's to Osamu Aoki
53     through the private e-mails in 2002.
54     </p>
55     <p>
56     In order to convert debiandoc-sgml into docbook-xml, following
57     steps needs to be taken:
58     </p>
59    
60     <p>
61     <ol>
62     <li>Install debian2docbookxml
63 osamu 551 <!--
64 osamu 550 <p>
65     Get a file <tt>Debiandoc2docbookxml.tar.gz</tt> from
66     <a href="http://www.teaser.fr/~pbatailler/">Philippe Batailler's web site</a>.
67     Then untar it and copy contents to the root of SGML file source.
68     </p>
69     <p>
70     Note: Something like following should work once this is packaged into deb.
71     <pre>
72     $ su -c "apt-get update && apt-get install ???debian2docbookxml???"
73     </pre>
74     </p>
75 osamu 551 -->
76     <p>
77     Get scripts from DDP CVS server
78     <pre>
79 osamu 567 $ cd $HOME
80     $ echo 'export PATH="~/Debiandoc-to-docbook:${PATH}"'>> ~/.bash_profile
81 osamu 554 $ . ~/.bash_profile
82 osamu 551 $ export CVSROOT=:pserver:anonymous@cvs.debian.org:/cvs/debian-doc
83     $ cvs login
84 osamu 567 $ cvs co -d Debiandoc-to-docbook utils/debiandoc-to-docbook # ***
85     $ cd Debiandoc-to-docbook
86 osamu 551 $ make
87 osamu 554 $ cd /some/debiandoc/sgml/source-directory/ # use of mc is easy way :)
88 osamu 551 </pre>
89 osamu 567 ***) Whoever checked out from old CVS location, please commit all
90 osamu 561 changes and check out all new trees in a different location.
91 osamu 567 I will remove old CVS location soon.
92 osamu 551 </p>
93 osamu 558
94     <li>Make source file compatible with script manually
95 osamu 559 <p>
96     Due to some conversion script limitations, if you experience problems
97     converting files, please consider the following source touch-up rules
98 osamu 560 presented below, although script might have fixed some of the issues
99     already (it will not harm).
100 osamu 559 </p>
101 osamu 550 <ul>
102 osamu 560 <li>Adjust SGML header lines (first few lines of the file)
103 osamu 559 <p>If foo.sgml includes many files (subset of dtd), first line must end
104     with [ as:
105 osamu 550 <pre>
106     &lt;!DOCTYPE debiandoc PUBLIC "-//DebianDoc//DTD DebianDoc//EN" [
107 osamu 560 ... content
108 osamu 553 ]&gt;
109 osamu 550 </pre>
110 osamu 553 Here splitting each start and end of these section will fail.
111 osamu 550 </p>
112     <p>
113     If foo.sgml is a single file, header is:
114     <pre>
115     &lt;!DOCTYPE debiandoc PUBLIC "-//DebianDoc//DTD DebianDoc//EN"&gt;
116     </pre>
117     </p>
118    
119     <li>Keep some conditionals within one line.
120     <p>
121     <pre>
122     &lt;[%bar;[
123     </pre>
124     </p>
125    
126     <li>Keep some some tags within one line.
127     <p>
128     <pre>
129     &lt;chapt&gt;...&lt;/chapt&gt;
130     &lt;appendix&gt;...&lt;/appendix&gt;
131     &lt;sect&gt;...&lt;/sect&gt;
132     &lt;sect1&gt;...&lt;/sect1&gt;
133     &lt;sect2&gt;...&lt;/sect2&gt;
134     </pre>
135     </p>
136    
137     <li>Remove some comment
138     <p>
139     <pre>
140     &lt;!-- ... --&gt;
141     </pre>
142     </p>
143     <p>
144     This is to avoid script to malfunction by "<tt> ]&gt; </tt>" in the comment.
145     </p>
146 osamu 558
147 osamu 560 <li>make attribute such as "id" a single token.
148 osamu 550 <p>
149     <pre>
150     wrong: &lt;book id="foo bar"&gt;
151     correct: &lt;book id="foo_bar"&gt;
152     </pre>
153     </p>
154 osamu 558
155 osamu 550 </ul>
156     </p>
157    
158     <li>Normalize SGML to XML compatible format (debiandoc-tidy)
159     <p>
160     <pre>
161 osamu 557 $ debiandoc-tidy foo.sgml
162     $ debiandoc-tidy -e bar.ent
163 osamu 550 </pre>
164     </p>
165    
166     <li>Convert SGML tags into XML tags (debiandoc2docbookxml)
167     <p>
168     If foo.sgml is smaller article in a single file without subset of dtd.
169     <pre>
170 osamu 557 $ debiandoc2docbookxml -a foo.sgml
171 osamu 550 </pre>
172     </p>
173     <p>
174     If foo.sgml is larger book in a single file without subset of dtd.
175     <pre>
176 osamu 557 $ debiandoc2docbookxml -b foo.sgml
177 osamu 550 </pre>
178     </p>
179     <p>
180     If foo.sgml is larger book with many included files (subset of dtd).
181     <pre>
182 osamu 557 $ debiandoc2docbookxml -s -b foo.sgml
183 osamu 550 </pre>
184     </p>
185     <p>
186 osamu 556 Now we have got a large single foo.xml
187 osamu 550 </p>
188 osamu 556 <p>
189 osamu 565 If foo.sgml is larger book with many included files (subset of dtd). To
190     create split file output in <tt>bar/</tt>,
191 osamu 556 <pre>
192 osamu 565 $ mkdir bar ; cd bar
193     $ debiandoc2docbookxml -S -s -b ../foo.sgml
194     $ cd ..
195 osamu 556 </pre>
196     </p>
197     <p>
198 osamu 565 Now we have got a foo.xml with many chunks of files under <tt>bar/</tt>
199 osamu 556 </p>
200     <p>
201 osamu 565 For debugging, use "-k" to keep intermediate files and use
202     "-t" to trace shell activities.
203 osamu 556 </p>
204 osamu 558
205     <li>Test it with emacs and psgml, or nsgmls:
206 osamu 550 <p>
207     <pre>
208     $ nsgmls -s /usr/share/sgml/declaration/xml.decl foo.xml
209     </pre>
210 osamu 558 <li>Format source for readability
211 osamu 553 <p>
212     In order to make source more readable, some reformatting may be good idea.
213 osamu 557 For example, to add newline after &lt;/listitem&gt;:
214 osamu 553 <pre>
215 osamu 557 $ perl -i -p -e's,&lt;/listitem&gt;,&lt;/listitem&gt;\n,g' foo.xml
216 osamu 553 </pre>
217     </p>
218    
219 osamu 550 <li>Building output
220     <p>
221     There are few strategies to build output.
222     <table border="1">
223     <tr>
224     <td>
225     Stylesheet
226     </td>
227     <td>
228 osamu 560 Back end
229 osamu 550 </td>
230     </tr>
231     <tr>
232     <td>DSSSL</td>
233     <td>jade and jadetex</td>
234     </tr>
235     <tr>
236     <td>CSS</td>
237     <td>mozilla?</td>
238     </tr>
239     <tr>
240     <td>XSL</td>
241     <td>passivetex?</td>
242     </tr>
243     </table>
244     <p>
245 osamu 551 Needs more documentation for creating files (plain text, multi-file,
246     HTML, PS, PDF).
247 osamu 550 </p>
248     </ol>
249     </p>
250     <hr>
251     <h2><a name="tags">How tags are converted?</a></h2>
252     <p>
253     Here is a conversion list of tags from debiandoc-sgml to dookbook-xml.
254 osamu 560 Each column means as follows:
255 osamu 550
256     <ul>
257     <li>"debiandoc-sgml tag" are the tags used in original documents.</li>
258     <li>"converted docbook-xml tag" are the tags converted programatically
259     by XLST.</li>
260     <li>"alternative docbook-xml tag" are the alternative tags which may
261 osamu 560 be used in places by editing dookbook-xml source later by the human.</li>
262 osamu 550 </ul>
263     </p>
264    
265     <table border="1">
266     <tr>
267     <td><p>original debiandoc-sgml tag</p></td>
268     <td><p>converted docbook-xml tag using XLST</p></td>
269     <td><p>alternative docbook-xml tag</p></td>
270     </tr>
271     <tr>
272     <td><p>book</p></td>
273     <td>
274     <p>book (-b option)</p>
275     <p>article (-a option)</p>
276     </td>
277     <td><p></p></td>
278     </tr>
279     <tr>
280     <td><p>title</p></td>
281     <td><p>title</p></td>
282     <td><p></p></td>
283     </tr>
284     <tr>
285     <td><p>author</p></td>
286     <td><p>author</p></td>
287     <td><p></p></td>
288     </tr>
289     <tr>
290     <td><p>name</p></td>
291     <td><p>firstname + surname</p></td>
292     <td><p></p></td>
293     </tr>
294     <tr>
295     <td><p>email</p></td>
296     <td>
297     <p>affiliation + address + email (in author element)</p>
298     <p>email (other places)</p></td>
299     </td>
300     <td><p></p></td>
301     </tr>
302     <tr>
303     <td><p>version</p></td>
304     <td><p>releaseinfo</p></td>
305     <td><p></p></td>
306     </tr>
307     <tr>
308     <td><p>abstract</p></td>
309     <td><p>abstract + para</p></td>
310     <td><p></p></td>
311     </tr>
312     <tr>
313     <td><p>copyright</p></td>
314     <td><p>copyright</p></td>
315     <td><p></p></td>
316     </tr>
317     <tr>
318     <td><p>toc</p></td>
319     <td>
320     <p>(presentation tool takes care)</p>
321     <p>(stylesheet is needed?) (oa)</p>
322     </td>
323     <td><p></p></td>
324     </tr>
325     <tr>
326     <td><p>chapt</p></td>
327 osamu 551 <td>
328     <p>chapter (-b option)</p>
329     <p>section (-a option)</p>
330     </td>
331 osamu 550 <td><p></p></td>
332     </tr>
333     <tr>
334     <td><p>appendix</p></td>
335     <td><p>appendix</p></td>
336     <td><p></p></td>
337     </tr>
338     <tr>
339     <td><p>sect</p></td>
340     <td><p>section</p></td>
341     <td><p></p></td>
342     </tr>
343     <tr>
344     <td><p>sect1</p></td>
345     <td><p>section</p></td>
346     <td><p></p></td>
347     </tr>
348     <tr>
349     <td><p>sect2</p></td>
350     <td><p>section</p></td>
351     <td><p></p></td>
352     </tr>
353     <tr>
354     <td><p>sect3</p></td>
355     <td><p>section</p></td>
356     <td><p></p></td>
357     </tr>
358     <tr>
359     <td><p>sect4</p></td>
360     <td><p>section</p></td>
361     <td><p></p></td>
362     </tr>
363     <tr>
364     <td><p>p</p></td>
365     <td><p>para</p></td>
366     <td><p></p></td>
367     </tr>
368     <tr>
369     <td><p>em</p></td>
370     <td><p>emphasis</p></td>
371     <td><p></p></td>
372     </tr>
373     <tr>
374     <td><p>strong</p></td>
375     <td>
376     <p>emphasis role="strong" (aph)</p>
377     <p>emphasis role="bold" (pb)</p>
378     </td>
379     <td><p>emphasis role="important"</p></td>
380     </tr>
381     <tr>
382     <td><p>var</p></td>
383     <td><p>replaceable</p></td>
384     <td><p></p></td>
385     </tr>
386     <tr>
387     <td><p>package</p></td>
388     <td><p>systemitem role="package"</p></td>
389     <td><p></p></td>
390     </tr>
391     <tr>
392     <td><p>prgn</p></td>
393     <td><p>command</p></td>
394     <td><p>??? (what to use for well known file w/o path)</p></td>
395     </tr>
396     <tr>
397     <td><p>file</p></td>
398     <td><p>filename</p><p>filename class="directory" (if it end with /)</p></td>
399     <td><p>filename class="directory"</p></td>
400     </tr>
401     <tr>
402     <td><p>tt</p></td>
403     <td><p>literal</p></td>
404     <td>
405     <p>command (this should have been prgn but many documents do this)</p>
406     <p>constant</p>
407     <p>computeroutput</p>
408     <p>envar</p>
409     <p>function</p>
410     <p>keycap</p>
411     <p>keycode</p>
412     <p>keycombo</p>
413     <p>keysym</p>
414     <p>markup</p>
415     <p>option</p>
416     <p>parameter</p>
417     <p>prompt</p>
418     <p>property</p>
419     <p>returnvalue</p>
420     <p>sgmltag</p>
421     <p>symbol</p>
422     <p>token</p>
423     <p>userinput</p>
424     <p>varname</p>
425     <p>wordasword</p>
426     <p>(do we need all these? are all in docbook-simple?</p>
427     </tr>
428     <tr>
429     <td><p>qref</p></td>
430     <td><p>link</p></td>
431     <td><p>citation ?</p></td>
432     </tr>
433     <tr>
434     <td><p>ref</p></td>
435     <td>
436     <p>xref (empty element)</p>
437     </td>
438     <td><p></p></td>
439     </tr>
440     <tr>
441     <td><p>manref</p></td>
442     <td><p>citerefentry + refentrytitle + manvolnum</p></td>
443     <td><p></p></td>
444     </tr>
445     <tr>
446     <td><p>ftpsite (old)</p></td>
447     <td><p>(convert original tag to url in debiandoc source)</p></td>
448     <td><p></p></td>
449     </tr>
450     <tr>
451     <td><p>ftppath (old)</p></td>
452     <td><p>(convert original tag to url in debiandoc source)</p></td>
453     <td><p></p></td>
454     </tr>
455     <tr>
456     <td><p>httpsite (old)</p></td>
457     <td><p>(convert original tag to url in debiandoc source)</p></td>
458     <td><p></p></td>
459     </tr>
460     <tr>
461     <td><p>httppath (old)</p></td>
462     <td><p>(convert original tag to url in debiandoc source)</p></td>
463     <td><p></p></td>
464     </tr>
465     <tr>
466     <td><p>url</p></td>
467     <td><p>ulink</p></td>
468     <td><p></p></td>
469     </tr>
470     <tr>
471     <td><p>footnote</p></td>
472     <td><p>footnote</p></td>
473     <td><p></p></td>
474     </tr>
475     <tr>
476     <td><p>list</p></td>
477     <td><p>itemizedlist</p></td>
478     <td><p></p></td>
479     </tr>
480     <tr>
481     <td><p>list compact</p></td>
482     <td><p>itemizedlist spacing="compact"</p></td>
483     <td><p></p></td>
484     </tr>
485     <tr>
486     <td><p>enumlist</p></td>
487     <td><p>orderedlist</p></td>
488     <td><p></p></td>
489     </tr>
490     <tr>
491     <td><p>enumlist compact</p></td>
492     <td><p>orderedlist spacing="compact"</p></td>
493     <td><p></p></td>
494     </tr>
495     <tr>
496     <td><p>taglist</p></td>
497     <td><p>variablelist</p></td>
498     <td><p></p></td>
499     </tr>
500     <tr>
501     <td><p>taglist compact</p></td>
502     <td><p>variablelist (there is no "spacing" attribute)</p></td>
503     <td><p>(possibly converting to table)</p></td>
504     </tr>
505     <tr>
506     <td><p>item</p></td>
507     <td><p>listitem + para</p></td>
508     <td><p></p></td>
509     </tr>
510     <tr>
511     <td><p>tag</p></td>
512     <td><p>varlistentry + term</p></td>
513     <td><p></p></td>
514     </tr>
515     <tr>
516     <td><p>example</p></td>
517     <td><p>screen</p></td>
518 osamu 551 <td>
519     <p>literallayout class="monospaced"</p>
520     <p></p>
521     </td>
522 osamu 550 </tr>
523     <tr>
524     <td><p>heading</p></td>
525     <td><p>title</p></td>
526     <td><p></p></td>
527     </tr>
528     <tr>
529     <td><p>comment</p></td>
530     <td><p>remark</p></td>
531     <td>
532     <p>caution</p>
533     <p>tip</p>
534     <p>warning</p>
535     <p>note</p>
536     </td>
537     </tr>
538     <tr>
539     <td><p>comment/p</p></td>
540     <td><p>phrase</p></td>
541     <td><p></p></td>
542     </tr>
543    
544     <tr>
545     <td><p>*HTML* (table)</p></td>
546     <td><p></p></td>
547     <td><p></p></td>
548     </tr>
549    
550     <tr>
551     <td><p>*HTML* (tr)</p></td>
552     <td><p></p></td>
553     <td><p></p></td>
554     </tr>
555    
556     <tr>
557     <td><p>*HTML* (th)</p></td>
558     <td><p></p></td>
559     <td><p></p></td>
560     </tr>
561    
562     <tr>
563     <td><p>*HTML* (td)</p></td>
564     <td><p></p></td>
565     <td><p></p></td>
566     </tr>
567    
568     <tr>
569     <td><p>*HTML* (img src)</p></td>
570     <td><p></p></td>
571     <td><p></p></td>
572     </tr>
573    
574     <tr>
575     <td><p>?</p></td>
576     <td><p></p></td>
577     <td><p></p></td>
578     </tr>
579    
580     </table>
581    
582     <p>
583     Here <strong>*HTML*</strong> entries above is not real tags in
584     debiandoc-sgml but tags of the missing feature to create
585     corresponding HTML tags.
586 osamu 556 </p>
587    
588 osamu 550 <p>
589 osamu 556 file splitting is has funny bug which create titletoc.xml in scripts
590 osamu 560 directory. Also, multi file XML requires entries like:
591 osamu 558 <pre>
592     &lt;!ENTITY titletoc SYSTEM "en/titletoc.sgml"&gt;
593     </pre>
594     Currently, this is manual process.
595 osamu 556 </p>
596 osamu 550
597     </body>
598     </html>

  ViewVC Help
Powered by ViewVC 1.1.5