element with a link to --> SGMLtools, version 2'> sgml-tools, version 1'> jade'> jadetex'> Emacs' SGML mode, psgml'> Modular DocBook Stylesheets'> /usr/lib/sgml/declaration/xml.decl'> ]>
The Debian SGML/XML HOWTO Stéphane Bortzmeyer The Debian Project
bortzmeyer@debian.org
Spelling and grammar fixes Guy Brand
guybrand@chimie.u-strasbg.fr
Spelling, grammar and style fixes John van der Koijk
jvdkoijk@wirehub.nl
$Id: howto.db,v 1.3 1999-10-13 15:35:03 bortz Exp $ 1999 Stéphane Bortzmeyer This text is distributed according to the General Public License.
Why this HOWTO? What's in it? This section explains why this HOWTO exist and which people it tries to help. It could be useful to read it first, before you lose time. What is in the HOWTO This HOWTO contains practical information about the use of SGML and XML on a Debian operating system. The HOWTO is task-oriented: you will see what Debian packages you will need for various tasks, and how to use them. It is intended for hurried people, who do not like to read and understand everything before starting and who prefers "hands on" training. We will cover SGML (and its subset XML), some DTD which I find important and the tools to write, format and display SGML, whether on the Web or in printing. The emphasis will be on SGML as a way to write documentation, not as a general data interchange tool. What's not in the HOWTO You will not find anything about installing and setting up software, since we assume a Debian system, where everything is already packaged. We will use only Debian packages, as they are shipped with Debian 2.2, nicknamed 'potato' (not yet ready when writing this)Debian 2.1, nicknamed 'slink'. This is not a tutorial on SGML or XML. Refer to for that type of information. Instead, you will get just enough SGML to get you started right now. Meta-information about this HOWTO This HOWTO is itself written in DocBook (XML) on a Debian system. The HOWTO can be retrieved from my Web page, including its source code. Why is this HOWTO specific to Debian? I said the purpose was to start quickly, remember? This means using actual filenames, actual commands and not wasting time compiling jade. And I hate to insert "Your mileage way vary" warnings everywhere. Therefore, I chose a specific operating system and I used the best one, Debian, which is also the only one with an integrated SGML environment... Even if it is not perfect, it works and, with this HOWTO, it even has a documentation. I added some pointers to other operating systems. What you really need to know about SGML I tried to keep this section short. However, I cannot explain anything without a small basis of concepts about SGML. So, let's go, before we switch to actual source code. What is structured documentation? Structured documentation is built upon structured elements: chapters, sections, paragraphs, etcetera, where all elements are clearly labeled for what they are: references, program output, etc. No explicit information about how the document should be rendered is given; only about its structure (and content). When there are explicit rules for presentation, they are kept outside the SGML-document. This allows for automatic processing of the documents, without waiting for AI systems. It encourages authors to concentrate on structure, which conveys meaning. Thus, the question "How do I put a word in bold with SGML?" has little relevance. One could ask how to put emphasis on a certain stretch of text. What is SGML? Standard Generalized Markup Language is a standardized language intended to facilitate the authoring of structured documentation It has other uses, such as data interchange.. More specifically, it is a meta-language. You never actually type SGML, but SGML is used to describe a document type specific structured language (this is called a DTD, a Document Type Definition), which defines how specific documants might be structured (written). Therefore, saying that a document is "in the SGML format" is technically correct, but deceptive. One could say that a document is in the DocBook format or the LinuxDoc format or the TEI format. What does SGML look like? SGML is a markup language. All SGML documents include text, mixed with tags, which delimits elementsDepending on the DTD, the end-tag can be mandatory or not. In XML, end-tags are always mandatory. . SGML allows several syntaxes to be used, but we'll stick with the reference syntax, the most common, where tags are enclosed between angle brackets, < and >. Here is an example: The Foo software Foo is very fast. And its documentation can be read easily. ]]> If it looks like HTML to you, it is because HTML is (theoretically) a DTD of SGML. Elements have a content. For instance, the content of the above para element is "Foo is very fast. And its documentation can be read easily.". Elements can have attributes to indicate more information. For instance: *c++; ]]> You can have also entities which allow you to parametrize some text. For instance, if you often refer to "the Best Operating System, Debian" and you want to avoid typing it each time or, worse, having to change every occurrence if you finally decide a more modest wording, you can declare an entity, let's call it "debian" and use it with the ampersand "&debian;"This is reference entities. SGML use other types of entities, which are not covered in this HOWTO.. One element is special: the root element is the global element, which contains everything. In XML, the DOCTYPE line indicates which element is the root. Here is an example (It seems there is a bug in the SGML environment of Debian 2.2, which requires a full path name for the DTD below. If so, this is a bug and I will investigate itTODO: do it, a bug against psgml has been filled. Follow it.): And the XML files? You'll learn later about XML. Let's just say that XML files begin first with a processing instruction, which starts with <? and, in that case indicates it is a XML file, as well as some meta-information. Example: ]]> XML files must be well-formed, which means that tags must be balanced (no crossing of tags which is common in the HTML output of many Web editors) and can be valid which means conformant to their DTD. Start-tags must always have an end-tag in XML, but you can have empty elements where the start-tag and the end-tag merge in a tag written with a / at the end like: ]]> What is a DTD? A Document Type Definition is the description (in SGML) of a specific language. You can write your own DTD (it is not very difficult, especially in XML) or you can use an already-existing DTD, which is convenient if you want to exchange documents with other people. Several such DTDs exist, typically for the purposes of a given group of people (astronoms, chemists, scholars in ancient literature...). The DTD lists the allowed elements and their relationships (for instance, it says a chapter must have at least one section). Typical DTDs that you may find useful: DocBook is mostly intended for writing technical documentation, especially about software. LinuxDoc is used by the Linux Documentation Project, for instance for the Linux HOWTOs. The LDP has decided to switch to DocBook, but the conversion has not been carried out. DebianDoc is used in part by the Debian Documentation Project. HTML is in theory an SGML DTD but very few actual Web pages are compliant. So, most SGML tools will choke on a typical Web page. At the beginning of a document, you will find a reference to the DTD to use (there are several ways to indicate such references; the following example is for LinuxDoc):
The Linux Kernel HOWTO ]]> </programlisting> <comment>TODO: Explain FPI, PUBLIC and SYSTEM, etc.</comment> </sect2> <sect2> <title>Which DTD to choose? Very often, you'll have no choice: the project you're a part of will have chosen already. Since standardization is of course very important in a big project, there is little chance you'll be able to change that. For instance, Linux Documentation Project uses LinuxDoc, FreeBSD, GNOME or KDE use DocBook, etc. If you have the choice, I suggest to stay close to what similar projects are doing. If you write technical documentation for computer hardware or software, this probably means using DocBook. How do I write SGML? Since SGML is a markup language, you can use any editor, like vi or even cat. But it is often easier with an editor which helps you inserting tags, knowing, for example, which are valid. I recommend Emacs with its SGML mode. What is XML? XML (Extensible Markup Language ) is a subset of SGML, a sort of SGML--. It was designed first for the World-Wide Web, but it is now used in unrelated areas. XML is much simpler than SGML, with less options, so a parser is lighter and faster. What is a stylesheet? In the markup world, you try to separate content from presentation. Content is expressed in the SGML document, following a given DTD. Presentation is expressed outside of the document, typically in a DTD-specific stylesheet, which is a description, in an appropriate language (DSSSL - Document Style Semantics and Specification Language - is the most commonThe XML world created a new language, XSL, which has few implementations at this moment (and none before Debian 2.2). Despite what you may read in executive summaries, it is perfectly acceptable to use DSSSL to render XML files.), of the layout rules for documents written for a certain DTD. For instance, it is the author of the stylesheet who will decide that titles should be rendered in bold, that URLs will be printed in red, etc. If you know the CSS (Cascading Style Sheets) language, do note that typical languages for SGML stylesheets are more complicated: they allow not only to specify the rendering of an element, but also the reordering of elements, computation of data from some elements, etc. DSSSL, for instance, is a full blown programming language (based on Scheme), enriched with stylesheet constructs. Creating documentation with DocBook Here, we will see how to write and process documentation, using the DocBook DTD. We will use the XML version, often named DocBk, because I prefer XMLAnd also because future versions of DocBook will be XML., but most of what is written here apply to the SGML version as well. To use it on a Debian system prior to 2.2 'potato', you'll need the docbook-xml. It installs fine on a 'slink' system and does not break anything (it is just a DTD, it does not depend on specific libraries). Writing DocBook You can skip this section if you just received a DocBook file and want to process it, rather than edit it. Like with any DTD, I recommend &psgml; to write DocBook. First, choose a root element, preferably the simplest, article. Start with:
My first XML document
My first section My first paragraph.
]]>
This is a complete DocBook document. You can validate it. Typical DocBook documents use book, chapter or article as the root element. Then, they include a header, where you find meta-information, such as the title of the document. After this header, a DocBook document is divided into sections, each with a title. More details would be nice. To know the complete list of elements, see docbook-doc, more specially docbook-doc.
Processing DocBook documents Remember, DocBook is not a program but a format. Asking "Does DocBook have a PDF output?" is meaningless. Software which uses DocBook may produce PDF. DocBook itself does nothing. There are several different solutions to produce printed paper, Web pages or manual pages from DocBook documents. You could program such a transformation yourself with tools like the Perl module XML::Parser or the Java module XP. Or you can use stylesheets, which you may or may not write yourself. If you decide not to write them, you can use the &modular_ss; with &jade;. Since we are using the XML version of DocBook, here is how to call &jade; to translate myfile.db to TeX: jade -t tex \ -d &print_ss; \ &xml_decl; myfile.db which will produce a TeX file using &jadetex; macros and needing the &jadetex; program to be processed: jadetex myfile.tex And to HTML: jade -t sgml \ -d &html_ss; \ &xml_decl; myfile.db Unfortunately, there is no easy way to create text-only output from a DocBook file, for instance for posting it on Usenet. The best available solution is to use the following kluge with lynx: jade -t sgml -V nochunks \ -d &html_ss; \ &xml_decl; myfile.db > dump.html lynx -force_html -dump dump.html > myfile.txt Using SGMLtools You can also use &sgmltools2;. This may be simpler, since &sgmltools2; automates the tasks performed by jade, jadetex and lynx. But it does not work with the XML version of DocBook. To convert a file to HTML: sgmltools --backend=ps howto.db And to PostScript: sgmltools --backend=ps howto.db And to pure text: sgmltools --backend=txt howto.db Automatize it with <application>make</application> Since the manipulations needed to convert from DocBook to anything can be complicated, the use of make is recommended. An example of a Makefile is: to recurse jadetex # "just enough". -cp -pf prior.aux pprior.aux -cp -pf $(shell basename $< .tex).aux prior.aux jadetex $< if ! cmp $(shell basename $< .tex).aux prior.aux && \ ! cmp $(shell basename $< .tex).aux pprior.aux && \ expr $(MAKELEVEL) '<' $(MAX_TEX_RECURSION); then \ rm -f $@ ;\ $(MAKE) $@ ;\ fi rm -f prior.aux pprior.aux myfile.ps: myfile.dvi dvips -f $< > $@ myfile.html: myfile.db html.dsl jade -t sgml \ -d $(HTML_SS) \ $(XML_DECL) $< myfile.txt: myfile.db jade -t sgml -V nochunks \ -d $(HTML_SS) \ $(XML_DECL) $< > dump.html lynx -force_html -dump dump.html > $@ -rm -f dump.html validate: nsgmls -s -wxml $(XML_DECL) myfile.db clean: rm -f *.html *.aux *.log *.dvi *.ps *.tex *.txt ]]> Misc TODO: localization in various languages. To convert DocBook to man pages or other formats, see docbook2man and docbook-to-man-ans. Customizing the Modular DocBook Stylesheets If you write a custom element or if you want to change the default rendering of an element or if you simply want to customize the output a bit (such as changing the default font), you'll have to define a custom stylesheet. This does not imply retyping everything. DSSSL allows one stylesheet to "use" another. The stylesheet inherits all of the properties of the stylesheet that it is using, but local definitions take precedence over imported ones. An example of a custom stylesheet is: ]> (define %body-font-family% ;; The font family used in body text "Palatino") ]]> Your style instructions (here the changing of the font to Palatino) have to be written in DSSSL, whose syntax and many semantics come from the programming language Scheme, which is itself a Lisp dialect. You do not need to learn Scheme, the docbook-stylesheetsdocbook-stylesheets-doc contains examples for most purposes. Since there are actually two stylesheets, one for printing and one for HTML, the above custom stylesheet works only for the first one. For the second, here is an exemple: ]> (define %generate-article-titlepage% #t) ]]> In both cases, you'll have to tell Jade to use your stylesheets, here myprint.dsl: jade -t tex \ -d myprint.dsl \ &xml_decl; myfile.db Customizing the DocBook DTD DocBook is intended to be customizable. There are many ways to do thatIncluding copying the DTD and editing it... But I was referring to `clean' ways of modifying the DTD, which will not create too many problems with future versions of DocBook., but be careful: customization may lead to problems when exchanging documents with others. See docbook-doc. If you add new elements, you'll probably have to create a custom stylesheet as well. Give examples of customization.
Creating documentation with LinuxDoc We will now go into writing and processing documentation using the LinuxDoc DTD. Writing LinuxDoc You can skip this section if you just received a LinuxDoc file (for instance one of the Linux HOWTOs, such as you can find in the LinuxDoc servers). You may write LinuxDoc documents with &psgml;. Here is a sample example:
Quick SGML Example <author>Matt Welsh, <tt>mdw@cs.cornell.edu</tt> <date>v1.0, 28 March 1994 <abstract> This document is a brief example using the Linuxdoc-SGML DTD. </abstract> <sect>Introduction <p> This is an SGML example file using the Linuxdoc-SGML DTD. </article> ]]> </programlisting> <para>A more complete example of a LinuxDoc document is <debiandoc file="example.sgml.gz">sgml-tools</debiandoc>.</para> <para>To learn the list of legal elements, see <debiandoc file="html/guide.html.gz">sgml-tools</debiandoc><phrase debianversionmax="2.2"> (it is currently buggy: the HTML files are compressed, which may harm your browser) </phrase><comment>TODO Does it work? It seems HTML files are gzipped :-( Fill in a bug report</comment>or see <ulink url="http://www.sgmltools.org/guide/guide.html">Matt Welsh's guide</ulink>.</para> </sect2> <sect2> <title>Processing LinuxDoc You will use &sgmltools1;. To convert a LinuxDoc document to HTML: sgml2html document.sgml To ordinary text, for instance to post it on the News: sgml2txt document.sgml And to PostScript, using LaTeX: sgml2latex --output=ps document.sgml The extension has to be .sgml or sgml-tools will do unproper things. You can have more information in sgmltools(1)sgmltools.v1(1). TODO: localization un various languages. Creating documentation with DebianDoc Here, we will see how to write and process documentation, using the DebianDoc DTD. Writing DebianDoc documents Here is a sample DebianDoc document: FooBar Bortzmeyer bortzmeyer@debian.org Title

Content

]]>
To know the list of legal tags, see debiandoc-sgml debiandoc-sgml-doc Bug #47300.
Processing DebianDoc documents To translate to PostScript: debiandoc2ps -1 myfile.dd And to HTML: debiandoc2html myfile.dd
Tools This section is no longer oriented toward tasks, but toward software that you can use to write and process SGML. The simplest way to get all these tools is to install task-sgml. To get all these tools, you'll have to install several packages. Here is the apt command which will do it for youProviding that apt has been configured properly before.: apt-get install docbook docbook-doc sp jade \ docbook-stylesheets jadetex debiandoc-sgml \ psgml <debianpackage name="psgml" refserver="http://www.lysator.liu.se/projects/about_psgml.html">PSGML</debianpackage> An excellent SGML mode for Emacs. Among its many features, it can: Show you what tags are valid at a given point, Insert tags (begin and end, as well as mandatory tags in between) from a menu which shows only valid tags (this is tremendously useful when you start to use a new and complicated DTD), Manipulate SGML elements, move according to elements, etc. Its documentation is in psgml. Having some options set up in your ~/.emacs will ease your use of psgml. Here are some examples: " ) ( "HTML 4.0 Blaireau" "" ) ( "DocBook 3.1 XML Article" " " ) )) (setq sgml-insert-missing-element-comment nil) ]]> Among the most useful psgml commands: C-c C-t : sgml-list-valid-tags reminds you (or teaches you) the DTD. Very convenient when you start playing with a monster like DocBook. sgml-insert-element. Again, a great way to learn a DTD <debianpackage name="sp" refserver="http://www.jclark.com/sp/nsgmls.htm">nsgmls</debianpackage> An SGML tool, for instance for validating SGML documents. A typical use is to check the validity of a document: nsgmls -s file.sgml This will check whether or not the contents of the file file.sgml conform to the DTD indicated in the header of the file. If you write XML documents, two options of nsgmls are necessary: nsgmls -s -wxml &xml_decl; file.sgml There is a sp. nsgmls being a part of the sp package, the sp for sp may be useful too. <debianpackage>rxp</debianpackage> A pure XML tool; can, for instance, be used to validate XML documents. <debianpackage refserver="http://www.jclark.com/jade/">jade</debianpackage> TODO: We should mention OpenJade! http://jade-cvs.avionitek.com/ jade is a DSSSL processor. It takes an SGML file and a stylesheet, written in the DSSSL language, and produces output in the TeX (for which PostScript can be made), RTF or HTML formats. It has no backend for groff and therefore has trouble producing ASCII. The TeX backend produces &jadetex; files. The documentation is not really clear but it at leasts tell you the various options. See jade. Typical uses: jade -t backend-to-use -d stylesheet-name input-file <debianpackage>jadetex</debianpackage> http://www.tug.org/applications/jadetex/" A set of TeX macros to process the output of jade. Poorly documented and difficult to customize. Like with every TeX macros, several runs may be necessary, in particular to resolve references. <debianpackage name="sgmltoolsv2" refserver="http://www.sgmltools.org/">SGMLtools</debianpackage> The SGMLtools exist in two versions, 1 and 2. SGMLtools is the version 2. Unlike sgml-tools, version 1, which processes LinuxDoc documents, SGMLtools, version 2, treats DocBook documents. You can do everything it does with direct calls to &jade; but it may be simpler to use SGMLtools. <debianpackage name="sgml-tools" refserver="http://www.sgmltools.org/download-1.0.html">sgml-tools, version 1</debianpackage> Did you notice the change in the capitalization? This version is officially deprecated and should no longer, in theory, be used anymore. But, in practice, since the move of the Linux Documentation Project from LinuxDoc to the DocBook DTD never occured, you still need sgml-tools version 1. <debianpackage name="docbook-stylesheets" refserver="http://www.nwalsh.com/docbook/dsssl/index.html"> Norman Walsh's "DocBook Modular Stylesheets"</debianpackage> These are a set of DSSSL stylesheets (with a recent XSL version). You can use them with any DSSSL tool, like &jade; to process DocBook documents. References SGML in general Cover's page A Gentle Introduction to SGML by the TEI people. Not very practical, IMHO. TODO: read it again, Sam. XML in general Official XML Cover's XML page XML FAQ DocBook Official DocBook docbook-doc Modular DocBook Stylesheets FreeBSD Documentation Project Primer is a nice introduction to SGML and DocBook Simplified DocBook, a version of DocBook with less elements to learn LinuxDoc Matt Welsh's SGML-Tools User's Guide Other operating systems: this section will list documents similar to this HOWTO (I mean practical documents) for operating systems other than Debian. Microsoft Windows NT RedHat users of DocBook should probably see the Cygnus tools. Interesting books SGML CD DuCHARME Bob Prentice-Hall 0-13-475740-8 A very good and practical book about the tools needed to write and process SGML on Unix and Windows NT. Does not cover XML. DocBook: The Definitive Guide Walsh Norman Muellner Leonard O'Reilly 1-56592-580-7 I didn't read it yet...