element with a link to -->
SGMLtools, version 2'>
sgml-tools, version 1'>
jade'>
jadetex'>
Emacs' SGML mode, psgml'>
Modular DocBook Stylesheets'>
/usr/lib/sgml/declaration/xml.decl'>
]>
The Debian SGML/XML HOWTOStéphaneBortzmeyerThe Debian Projectbortzmeyer@debian.orgSpelling and grammar fixesGuyBrandguybrand@chimie.u-strasbg.frSpelling, grammar and style fixesJohnvan der Koijkjvdkoijk@wirehub.nl$Id: howto.db,v 1.11 1999-11-20 13:31:39 bortz Exp $1999Stéphane BortzmeyerThis text is distributed according to the General
Public License.Why this HOWTO? What's in it?This section explains why this HOWTO exist and which people
it tries to help. It could be useful to read it first, before you
lose time.What is in the HOWTOThis HOWTO contains practical
information about the use of SGML and XML on a Debian operating
system.The HOWTO is task-oriented: you will see what Debian packages you will need for
various tasks, and how to use them. It is intended for hurried
people, who do not like to read and understand everything before
starting
and who prefers "hands on" training.
We will cover SGML (and its subset XML), some DTD which I
find important and the tools to write, format and display SGML,
whether on the Web or in printing. The emphasis will be on SGML
as a way to write documentation, not as a general data
interchange tool.What's not in the HOWTOYou will not find anything about installing and setting up
software, since we assume a Debian system, where everything is
already packaged. We will use only Debian packages, as they are
shipped with Debian 2.2, nicknamed 'potato' (not yet ready when
writing this)Debian 2.1, nicknamed
'slink'.This is not a tutorial on SGML or XML. Refer to
for that type of information.
Instead, you will get just enough
SGML to get you started
right now.Meta-information about this HOWTOThis HOWTO is itself written in DocBook (XML) on a Debian
system. The HOWTO can be retrieved from my Web
page, including its source code.Why is this HOWTO specific to Debian?I said the purpose was to start quickly, remember? This
means using actual filenames, actual commands and not wasting
time compiling jade. And I hate to insert "Your mileage way
vary" warnings everywhere. Therefore, I chose a specific
operating
system and I used the best one, Debian, which is also the
only one with an integrated SGML environment... Even if it is not
perfect, it works and, with this HOWTO, it even has a
documentation.I added some pointers to other
operating systems.What you really need to know about SGMLI tried to keep this section short. However, I cannot
explain anything without a small basis of concepts about SGML. So,
let's go, before we switch to actual source code.What is structured documentation?Structured documentation is built upon structured elements:
chapters, sections, paragraphs, etcetera, where all elements
are clearly labeled for
what they are: references, program output, etc. No explicit information
about how the document should be rendered is given;
only about its structure (and content).
When there are explicit rules for presentation, they are kept
outside the SGML-document.This allows for automatic processing of the documents, without
waiting for AI systems. It encourages authors to concentrate on
structure, which conveys meaning. Thus, the question "How do I put a word in bold with
SGML?" has little relevance. One could ask
how to put emphasis on a certain stretch of text.What is SGML?Standard Generalized Markup Language is a standardized
language intended to facilitate the authoring of
structured documentation
It has
other uses, such as data interchange.. More
specifically, it is a meta-language. You never actually type
SGML, but SGML is used to describe
a document type specific structured language
(this is called a DTD, a Document Type
Definition), which defines how specific documants might be
structured (written).Therefore, saying that a document is "in the SGML format"
is technically correct, but deceptive. One could say that a
document is in the DocBook format or the LinuxDoc format or the
TEI format.What does SGML look like?SGML is a markup language. All SGML documents include
text, mixed with tags, which delimits
elementsDepending on the DTD, the
end-tag can be mandatory or not. In XML, end-tags are always mandatory.. SGML allows several syntaxes to be
used, but we'll stick with the reference syntax, the most
common, where tags are enclosed between angle brackets, < and
>. Here is an example:The Foo software
Foo is very fast. And its documentation can be read easily.
]]>
If it looks like HTML to you, it is because HTML is
(theoretically) a DTD of SGML.Elements have a content. For
instance, the content of the above para
element is "Foo is very fast.
And its documentation can be read easily.".Elements can have attributes to
indicate more information. For instance:
*c++;
]]>
You can have also entities which
allow you to parametrize some text. For instance, if you often
refer to "the Best Operating System, Debian" and you want to
avoid typing it each time or, worse, having to change every
occurrence if you finally decide a more modest wording, you
can declare an entity, let's call it "debian" and use it with
the ampersand "&debian;"This is
reference entities. SGML use other types
of entities, which are not covered in this HOWTO.. One element is special: the root
element is the global element, which contains
everything. In XML, the DOCTYPE line indicates which element
is the root. Here is an example (It
seems there is a bug in the SGML environment of Debian 2.2,
which requires a full path name for the DTD below. If so,
this is a bug and I will investigate itTODO: do
it, a bug against psgml has been filled. Follow it.):And the XML files?You'll learn later about
XML. Let's just say that XML files begin first with a
processing instruction, which starts with
<? and, in that case indicates it is a XML file, as well as
some meta-information. Example:
]]>
XML files must be well-formed,
which means that tags must be balanced (no crossing of tags
which is common in the HTML output of many Web editors) and
can be valid which means conformant to
their DTD.Start-tags must always have an end-tag in XML, but you
can have empty elements where the
start-tag and the end-tag merge in a tag written with a / at
the end like:
]]>
What is a DTD?A Document Type Definition is the description (in SGML)
of a specific language. You can write your own DTD (it is not
very difficult, especially in XML) or you can use an
already-existing DTD, which is convenient if you want to
exchange documents with other people. Several such DTDs exist,
typically for the purposes of a given group of people (astronoms, chemists, scholars in ancient literature...).The DTD lists the allowed elements and their
relationships (for instance, it says a chapter
must have at least one section).Typical DTDs that you may find useful:DocBook is mostly intended for
writing technical documentation, especially about software.LinuxDoc
is used by the Linux Documentation Project, for instance for
the Linux
HOWTOs. The LDP has decided to switch to DocBook, but the
conversion has not been carried out.DebianDoc is used in part by the
Debian Documentation Project.HTML is
in theory an SGML DTD but
very few actual Web pages are compliant. So, most SGML tools
will choke on a typical Web page.At the beginning of a document, you will find a
reference to the DTD to use (there are several ways to indicate
such references; the following example is for LinuxDoc):The Linux Kernel HOWTO
]]>
TODO: Explain FPI, PUBLIC and SYSTEM, etc.Which DTD to choose?Very often, you'll have no choice: the project you're a
part of will have chosen already. Since standardization is of
course very important in a big project, there is little chance
you'll be able to change that. For instance, Linux Documentation
Project uses LinuxDoc, FreeBSD,
GNOME or KDE use DocBook, etc.If you have the choice, I suggest to stay close to what
similar projects are doing. If you write technical documentation
for computer hardware or software, this probably means using
DocBook.How do I write SGML?Since SGML is a markup language, you can use any editor,
like vi or even
cat.But it is often easier with an editor which helps you
inserting tags, knowing, for example, which are valid. I recommend
Emacs with its SGML mode.What is XML?XML (Extensible Markup Language ) is a subset of SGML, a
sort of SGML--. It
was designed first for the World-Wide Web, but it is now used
in unrelated areas.XML is much simpler than SGML, with less options, so a parser is
lighter and faster.What is a stylesheet?In the markup world, you try to separate content from
presentation. Content is expressed in the SGML document,
following a given DTD. Presentation is expressed outside of the
document, typically in a DTD-specific stylesheet, which is a description, in an
appropriate language (DSSSL
- Document Style Semantics and
Specification Language - is the most commonThe
XML world created a new language, XSL, which has few
implementations at this moment
(and none before Debian 2.2).
Despite what you may read in executive
summaries, it is perfectly acceptable to use DSSSL to render XML
files.),
of the layout rules for documents written for a certain DTD.For instance, it is the author of the stylesheet who will
decide that titles should be rendered in bold, that URLs will be
printed in red, etc.If you know the
CSS (Cascading
Style Sheets) language, do note that typical languages for SGML
stylesheets are more complicated: they allow not only to specify
the rendering of an element, but also the reordering of elements,
computation of data from some elements, etc. DSSSL, for instance, is a
full blown programming language (based on Scheme), enriched with stylesheet
constructs.Creating documentation with DocBookHere, we will see how to write and process documentation,
using the DocBook DTD. We will use the XML version, often named
DocBk, because
I prefer XMLAnd also because future versions of
DocBook will be XML., but most of what is written here apply to the SGML
version as well.To use it on a Debian system prior to
2.2 'potato', you'll need the
docbook-xml. It installs fine on a
'slink' system and does not break anything (it is just a DTD, it
does not depend on specific libraries).Writing DocBookYou can skip this section if you just received a DocBook
file and want to process it, rather than edit it.Like with any DTD, I recommend &psgml; to write DocBook. First, choose a root element, preferably the simplest,
article. Start with:My first XML documentMy first sectionMy first paragraph.
]]>
This is a complete DocBook document. You can
validate itJust ignore the warning /usr/lib/sgml/declaration/xml.decl:1:W: SGML declaration was not implied.Typical DocBook documents use book, chapter or article as the
root element. Then, they include a header, where you find meta-information,
such as the title of the document. After this header, a DocBook
document is divided into sections, each with a title.
More details would be nice.To know the complete list of elements, see
docbook-doc, more specially
docbook-doc.Processing DocBook documentsRemember, DocBook is not a program but a format. Asking
"Does DocBook have a PDF output?" is meaningless. Software which
uses DocBook may produce PDF. DocBook itself does nothing.There are several different solutions to produce printed
paper, Web pages or manual pages from DocBook documents. You could
program such a transformation yourself with tools like the Perl module
XML::Parser or the Java module XP. Or you can use stylesheets,
which you may or may not write yourself. If you decide not to
write them, you can use
the &modular_ss; with &jade;.Since we are using the XML version of DocBook, here is how to
call &jade; to translate myfile.db to TeX:
jade -t tex -V tex-backend \
-d &print_ss; \
&xml_decl; myfile.db
which will produce a TeX file using &jadetex; macros and
needing the &jadetex; program to be processed:
jadetex myfile.tex
And to HTML:
jade -t sgml \
-d &html_ss; \
&xml_decl; myfile.db
Unfortunately, there is no easy way to create text-only
output from a DocBook file, for instance for posting it on
Usenet. The best available solution is to use the following
kluge with
lynx:
jade -t sgml -V nochunks \
-d &html_ss; \
&xml_decl; myfile.db > dump.html
lynx -force_html -dump dump.html > myfile.txt
Using SGMLtoolsYou can also use &sgmltools2;. This may be
simpler, since &sgmltools2; automates the tasks performed by
jade, jadetex and lynx. But it does not work with the XML
version of DocBook. To convert a file to HTML:
sgmltools --backend=html howto.db
And to PostScript:
sgmltools --backend=ps howto.db
And to pure text:
sgmltools --backend=txt howto.db
Automatize it with makeSince the manipulations needed to convert from
DocBook to anything can be complicated, the use of
make
is recommended. An example of a Makefile is:
to recurse jadetex
# "just enough".
-cp -pf prior.aux pprior.aux
-cp -pf $(shell basename $< .tex).aux prior.aux
jadetex $<
if ! cmp $(shell basename $< .tex).aux prior.aux && \
! cmp $(shell basename $< .tex).aux pprior.aux && \
expr $(MAKELEVEL) '<' $(MAX_TEX_RECURSION); then \
rm -f $@ ;\
$(MAKE) $@ ;\
fi
rm -f prior.aux pprior.aux
myfile.ps: myfile.dvi
dvips -f $< > $@
myfile.html: myfile.db html.dsl
jade -t sgml \
-d $(HTML_SS) \
$(XML_DECL) $<
myfile.txt: myfile.db
jade -t sgml -V nochunks \
-d $(HTML_SS) \
$(XML_DECL) $< > dump.html
lynx -force_html -dump dump.html > $@
-rm -f dump.html
validate:
nsgmls -s -wxml $(XML_DECL) myfile.db
clean:
rm -f *.html *.aux *.log *.dvi *.ps *.tex *.txt
]]>
Localization (l10n) of the outputLocalization (often written l10n to save space) is the
adaptation to a different language. Let's take French (whose
ISO code is "fr") as an example: DocBook can be l10n'ed for
other languages (see
/usr/lib/sgml/stylesheet/dsssl/docbook/nwalsh/common/dbl1*
for a list).With the XML version, you have two ways to tell the
language:
Using the lang
attribute.
http://nwalsh.com/docbook/dsssl/doc/custom.html#AEN190
seems wrong.
]]>
And you will get labels ("Table of contents", "Next", "Previous", etc)
in French.In the custom stylesheet:
(define %default-language% "fr")
The lang attribute seems ignored in slink's software?
You can set the language in the
custom stylesheet:
(define %default-language% "fr")
which will get you labels ("Table of contents", "Next", "Previous", etc)
in French.It is not a complete l10n: hyphenation in the TeX output
will not be correct, for instance.A bug in the packages in the
"slink" version will produce jadetex warnings:
l.101 \select@language{francais}
! Package babel Error: You haven't defined the language francais yet.
See the babel package documentation for explanation.
which you can ignore.MiscTo convert DocBook to man pages or other formats, see docbook2man
and docbook-to-man-ans.
Customizing the Modular DocBook StylesheetsIf you write a custom
element
or if you want to change the default rendering of
an element or if you simply want to customize the output a bit
(such as changing the default font), you'll have to define a
custom stylesheet. This does not imply retyping everything.
DSSSL allows one stylesheet to "use" another. The stylesheet
inherits all of the properties of the stylesheet that it is
using, but local definitions take precedence over imported ones.
An example of a custom stylesheet is:
]>
(define %body-font-family%
;; The font family used in body text
"Palatino")
]]>
Your style instructions (here the changing of the font
to Palatino) have to be written in DSSSL, whose syntax and
many semantics come from the programming language Scheme,
which is itself a Lisp dialect. You do not need to learn
Scheme, the docbook-stylesheetsdocbook-stylesheets-doc
contains examples for most purposes.Since there are actually two stylesheets, one for printing and
one for HTML, the above custom stylesheet works only for the
first one. For the second, here is an exemple:
]>
(define %generate-article-titlepage% #t)
]]>
In both cases, you'll have to tell Jade to use your
stylesheets, here myprint.dsl:
jade -t tex -V tex-backend \
-d myprint.dsl \
&xml_decl; myfile.db
Customizing the DocBook DTD
DocBook is intended to be customizable. There are many ways
to do thatIncluding copying the DTD and
editing it... But I was referring to `clean' ways of modifying the
DTD, which will not create too many problems with
future versions of DocBook.,
but be careful: customization may lead to problems
when exchanging documents with others. See docbook-doc.
If you add new elements, you'll probably have to
create a custom stylesheet as well.
Give examples of customization.
Creating documentation with LinuxDocWe will now go into writing and processing documentation
using the LinuxDoc DTD.Writing LinuxDocYou can skip this section if you just received a LinuxDoc
file (for instance one of the Linux HOWTOs, such as you can find in the
LinuxDoc servers).You may write LinuxDoc documents with &psgml;. Here is a
sample example:Quick SGML Example
Matt Welsh, mdw@cs.cornell.eduv1.0, 28 March 1994
This document is a brief example using the Linuxdoc-SGML DTD.
Introduction
This is an SGML example file using the Linuxdoc-SGML DTD.
]]>
A more complete example of a LinuxDoc document is sgml-tools.To learn the list of legal elements, see sgml-tools (it is currently buggy: the HTML files
are compressed, which may harm your browser) TODO Does it
work? It seems HTML files are gzipped :-( Fill in a bug reportor see Matt Welsh's guide.Processing LinuxDocYou will use &sgmltools1;. To convert a
LinuxDoc document to HTML:
sgml2html document.sgml
To ordinary text, for instance to post it on the News:
sgml2txt document.sgml
And to PostScript, using LaTeX:
sgml2latex --output=ps document.sgml
The extension has to be .sgml or sgml-tools will do unproper
things.
You can have more information in sgmltools(1)sgmltools.v1(1).TODO: localization un various languages.Creating documentation with DebianDocHere, we will see how to write and process documentation,
using the DebianDoc DTD.Writing DebianDoc documentsHere is a sample DebianDoc document:FooBarBortzmeyerbortzmeyer@debian.orgTitle
Content
]]>
To know the list of legal tags, see debiandoc-sgmldebiandoc-sgml-docBug #47300.Processing DebianDoc documents
To translate to PostScript:
debiandoc2ps -1 myfile.dd
And to HTML:
debiandoc2html myfile.dd
ToolsThis section is no longer oriented toward tasks, but toward
software that you can use to write and process SGML. The simplest way to
get all these tools is to install
task-sgml. To get all these tools, you'll have to
install several packages. Here is the
apt command which will do it for
youProviding that apt
has been configured properly before.:
apt-get install docbook docbook-doc sp jade \
docbook-stylesheets jadetex debiandoc-sgml \
psgml
PSGMLAn excellent SGML mode for Emacs. Among its many features,
it can:Show you what tags are valid at a given point,Insert tags (begin and end, as well as
mandatory tags in between) from a menu which shows only valid
tags (this is tremendously useful when you start to use a new
and complicated DTD),Manipulate SGML elements, move
according to elements, etc.Its documentation is in psgml.
Having some options set up in your ~/.emacs will
ease your use of psgml. Here are some examples:
" )
( "HTML 4.0 Blaireau"
"" )
( "DocBook 3.1 XML Article"
"
" )
))
(setq sgml-insert-missing-element-comment nil)
]]>
Among the most useful psgml commands:
C-c C-t :
sgml-list-valid-tags
reminds you
(or teaches you) the DTD. Very convenient when you start
playing with a monster like DocBook.sgml-insert-element. Again, a great
way to learn a DTDnsgmlsAn SGML tool, for instance for validating SGML documents. A
typical use is to check the validity of a document:
nsgmls -s file.sgml
This will check whether or not the contents of the file
file.sgml conform to the DTD indicated
in the header of the file.If you write XML documents, two options of nsgmls are
necessary:
nsgmls -s -wxml &xml_decl; file.sgml
There is a sp. nsgmls being a part of the
sp package, the sp for sp may be useful
too.rxpA pure XML tool; can, for instance, be used to validate XML
documents.jadeTODO: We should mention OpenJade! http://jade-cvs.avionitek.com/jade is a DSSSL
processor. It takes an SGML file and a
stylesheet, written in the DSSSL language, and produces output
in the TeX (for which PostScript can be made), RTF or HTML
formats.
It has
no backend for groff and therefore has trouble producing
ASCII. The TeX backend produces &jadetex; files.The documentation is not really clear but it at leasts
tell you the various options. See jade.Typical uses:
jade -t backend-to-use -d stylesheet-name input-file
jadetexhttp://www.tug.org/applications/jadetex/"A set of TeX macros to process
the output of jade. Poorly documented and difficult to
customize. Like with every TeX macros, several runs may be
necessary, in particular to resolve references.SGMLtoolsThe SGMLtools exist in two versions, 1 and 2. SGMLtools is
the version 2.Unlike sgml-tools, version 1, which processes LinuxDoc
documents, SGMLtools, version 2, treats DocBook
documents. You can do everything it does with direct calls
to &jade; but it may be simpler to use SGMLtools.sgml-tools,
version 1Did you notice the change in the
capitalization?
This version is officially deprecated and should no
longer, in theory, be used anymore. But, in practice, since
the move of the Linux Documentation Project from LinuxDoc to
the DocBook DTD never occured, you still need sgml-tools
version 1.
Norman Walsh's "DocBook Modular Stylesheets"These are a set of DSSSL stylesheets (with a recent XSL version). You
can use them with any DSSSL tool, like &jade; to process DocBook documents.ReferencesSGML in generalCover's
page
A Gentle Introduction to SGML by the TEI people. Not very
practical, IMHO. TODO: read it again, Sam.XML in generalOfficial XMLCover's XML
pageXML FAQDocBookOfficial
DocBookdocbook-doc
Modular DocBook Stylesheets
FreeBSD Documentation Project Primer is a nice introduction to
SGML and
DocBook
Simplified DocBook, a version of DocBook with
less elements to learnDifferent customizations or extensions to
DocBookonShore's custom
stylesheets are in a unofficial Debian package,
"onshore-sgml". To get it, add deb
http://cafe.onshore.com/debian local/ to your
/etc/apt/sources.list. Norman Walsh puts online all the stuff needed
to manage his Web site.FreeBSD has DocBook
customizations, too.LinuxDocMatt
Welsh's SGML-Tools User's GuideOther operating systems: this section will list
documents similar to this HOWTO (I mean
practical documents) for operating systems
other than Debian.Microsoft
Windows NTRedHat users of DocBook should probably see the
Cygnus tools.For FreeBSD, if you want to use DocBook, see their
FreeBSD
Documentation Project Primer for New
Contributors, specially its list of mandatory tools.
SuSE's has some tools to process DocBook documents easily.
It is not operating system-specific but O'Reilly has a
nice
documentation about their publishing system.
Interesting booksSGML CDDuCHARMEBobPrentice-Hall0-13-475740-8A very good and practical book about the tools
needed to write and process SGML on Unix and Windows
NT. Does not cover XML. A very good chapter about &psgml;
and a nice page of PSGML tricks.DocBook: The Definitive GuideWalshNormanMuellnerLeonardO'Reilly1-56592-580-7I didn't read it yet. The entire book is
also online.