LaTex vs DocBook

I’m about to embark on a documentation project, and I’ve been thinking quite a bit about the tools I’d ideally like to use for the job.

Previously, when I’ve done a lot of writing, it’s been with LaTeX, or on rare, painful, occasion with a word processor or dedicated tech pubs tool like FrameMaker.

What I like about using a text-based markup language is the ability to use normal revision control tools to see the evolution of what I’ve been doing. Ignoring any issues surrounding whether word processors are more or less appropriate than feeding text markup into a compiler, trying to use revision control tools with opaque word processor files is pointless, as I get no information more useful than a blob; trying to use the internal change-tracking features of word processors is worse.

I have had something of a problem with my choice of markup language. My normal default would be to use LaTeX, but it’s long been difficult to produce good HTML from LaTeX; people have a reasonable expectation that what they’re reading will look attractive on the web. The standard tool for producing HTML from LaTeX was long latex2html, which produces atrocious, unnavigable HTML.

Believing that this was still the case, my fallback plan was to use DocBook, but I am finding the markup to be painfully obtrusive. Worse, the pipeline for producing HTML and PDF files is opaque to me; I crank a pile of stuff into Apache FOP, bits of formatted sausage come out the other end, and I don’t really know what happens in the middle. The typeset PDF files produced by FOP, at least using the style sheets I’ve borrowed from the Subversion Book, look clunky, which makes me unhappy. It had been looking like I could get either attractive PDF or HTML output with a particular choice of tool, but not both. At least, not unless I wanted to go down the tortured path of converting DocBook to LaTeX for PDFs.

Some determined Googling this evening yielded the Tex4ht package, which produces XHTML, uses CSS, and seems well integrated into the normal TeX environment. I also found some likely knobs that I can try twiddling with PDFTeX, to get it to use hyperlinks more liberally while still typesetting my document beautifully.

So the main advantage that DocBook held for me, the decent XHTML and CSS output I could produce for negligible effort, seems gone. I think I can return to the familiar free tools that I’ve been using for over a decade. What a relief!

Posted in software
13 comments on “LaTex vs DocBook
  1. Cameron Kerr says:

    I’ve recently embarked on a similar task, converting all my IT courseware from LaTeX and MS Word to Docbook XML. Part of this was because I wanted to be able to produce HTML easily, as the course is to be sold to other universities in subsequent years, and I wanted to make it easier to customise.

    If you use a good _validating_ XML editor, you’ll find markup to be not too obtrusive. There are WYSIWYG editors for Docbook, but I haven’t really seriously used them. I use Emacs nXML mode, which is really useful (I normally prefer vim, and I learned Emacs solely for nxml-mode). Because it’s a validating mode, it will be able to complete tag names, and show you if a piece of input is erroneous. Look around for the docbook-menu package as well, which will be useful for accessing Docbook: The Definitive Guide, and docbook-xsl.

    One of the great things about Docbook XML is the support for profiling, which means I can easily create a version for students, and another for instructors, which has answers and teachers notes.

    The toolset includes: Mac OS X, xsltproc (as provided by OS X,) FOP (downloaded from, and the stylesheets provided via the Fink.

    To tie it all together into a convient bundle for myself, I created some shell scripts that supported things like chunking and teacher profile.

    I have a number of scripts, one for each output format (html or pdf) and input format (book or article). I also have another script which calls the others to produce common outputs I’m interested in.

    (Note: on Mac OS X, the ‘open’ command will open a file with the usual application. You’ll want to change that for others, as well as the location of the stylesheets.)

    —8″ >&2
    exit 2
    set — $args
    for i
    case “$i” in
    shift; break;;

    base=$(basename “$input” | sed -e ‘s/\.[^.]*//’)

    if [ $teacher = yes ]; then
    stringparams=”$stringparams –stringparam profile.userlevel teacher”
    stringparams=”$stringparams –stringparam profile.userlevel student”

    echo “### $input into PDF; teacher=$teacher” >&2


    if [ “$base.pdf” -ot “$input” ]; then
    xsltproc \
    –stringparam generate.toc 0 \
    –stringparam paper.type A4 \
    –stringparam body.font.master 11 \
    $stringparams \
    “$stylesheet” “$input” > “$”
    fop “$” “$base.pdf”
    rm “$”
    echo “Output is up-to-date, not recreating”

    if [ “$openwhendone” = ‘yes’ ]; then
    open “$base.pdf” 2>/dev/null

  2. bos says:

    Cameron, I’m glad you’ve had a good experience with DocBook. My friends who wrote the Subversion book are quite happy with it, too. I found myself spending more time trying to understand DocBook than I did writing early on, as you might expect.

    The clincher was that I preferred the PDF typesetting from LaTeX over what I could get from FOP, and I had a decade’s experience with the toolchain to guide me through the rough spots, where FOP was for me a black box.

    I wouldn’t necessarily advise anyone to follow in my footsteps, but I’m still happy with the choice I made.

  3. Cameron Kerr says:

    Bothersome markup systems. I’ve posted my files on my blog.

  4. Carl says: explains how to print a book using XHTML, CSS, and “prince” an XHTML-to-PDF utility. (Not free though. :/) I’ve been using those in conjunction with and getting pretty good looking term papers. Your results may vary though.

  5. larsivi says:

    I’m working with getting DocBook in as the documentation format for my current client. LaTex wasn’t really an option, as they’re very familiar with XML from other projects. I also find that even though XSL-FO is rather complex, the neatness of the docbook-xsl set will allow for very powerful possibilities, customizing output for client’s clients, etc.

    The docbook-xsl set itself is big, but clean, and I’ve found the book to be invaluable in setting up customized style sheets. I’m also considering how additional attribute usage (there are a couple in the DocBook spec that you’re free to “interpret” yourself), can be used to create more dynamic webpages, especially for software documentation that might be different for users of different platforms. Thus a user might log in, say his preference is MacOSX docs, and the stylesheets filters out what is not necessary. Note that this process is probably too heavy to have fully dynamic, but it can be used to create seemingly dynamic doc content.

    XMLmind is a neat freeware editor that lets you write WYSIWYG, but without letting you forget that you’re really editing XML. I have tried different options just to get a feel of what is best; OpenOffice can save as DocBook, and this is an ok’ish solution for the first draft. It is not perfect though, as it drops quite a bit of formatting which needs to be added to the XML afterwards. For this I tend to use Vim. I believe the XML additions lets you do validating editing, but it is reported to conflict with some docbook specific scripts I’ve been using. I like the WYSIWYG part of XMLmind, but for larger edits and changes to tags, something else is needed, read: vim.

  6. Chris Double says:

    Have you looked at AsciiDoc?

    It can produce nice HTML and PDF files. There’s a docbook backend and a latex backend.

  7. bos says:

    Lars, I noticed after using Tex4ht for a while that it could apparently generate DocBook XML. So it is at least theoretically possible that if you like LaTeX (I don’t know if you do), you could work in LaTeX while providing data to your clients in their preferred format.

  8. bos says:

    Chris, yes, I’ve used AsciiDoc, though not on especially large projects. It’s nice for writing man pages and the like (that’s about the extent of my experience), and the toolset around it is good.

  9. Tom Emerson says:

    Don’t forget the old standby TexInfo — products decent PDF documentation (via TeX), HTML, and probably multitudinous other formats.

  10. John says:

    Tom, I’ve used Texinfo too. It’s like a slimmed down LaTeX (except it uses ‘@’ signs instead of backslashes). I like it, though if you are documenting something with a lot of ‘@’ signs in it (like Perl or Ruby code, for example), I guess it could be slightly annoying at times. It’s still actively maintained, which is nice.

    I’m not crazy about the info tool, but I really like the pdf/dvi output from Texinfo (since it’s TeX-generated). The html output is plenty handy too (though, haven’t tried it with actual mathematics yet :) ).

    It’s always surprised me that there’s not really a *standard* latex-to-html tool out there. It seems like such a no-brainer that that’s really what the LaTeX community needs. Last time I checked, the latex2html project seemed to have pretty much stalled out (also had trouble building it on my system), and Tex4ht’s generating html from dvi seems kinda wacky to me (though I know next to nothing about dvi’s internal format).

    One thing coming down the pike that looks pretty interesting to me is Perl 6’s Pod. The spec is nearing completion, and, for now, can be found [here]( Could be a very nice general doc format, even if you’re not interested in Perl.

  11. Lon says:

    By now I guess perhaps you’ve discovered dblatex…? DocBook to LaTeX to PDF.

  12. osun says:

    I googled “latex vs docbook”, and got this post as the first result. I love LaTex too. I noticed you’ve published some books with O’reilly, but on O’reilly’s pages for authors, I could only find their templates for msword and docbook, and on they say LaTex is in the “strongly discouraged” list. Could I ask did you provide your book to O’reilly in LaTex format?

  13. I tend not to drop a lot of responses, but i did some searching and wound up here LaTex vs DocBook
    | teideal glic deisbhéalach. And I actually do have 2 questions for you if it’s allright.
    Could it be simply me or does it appear like some of these responses look
    like they are left by brain dead folks? 😛 And, if you are posting at
    other online sites, I would like to keep up with anything new you have to post.
    Would you make a list of every one of your communal pages like your twitter feed, Facebook
    page or linkedin profile?

2 Pings/Trackbacks for "LaTex vs DocBook"
  1. […] LaTex vs DocBook (tags: docbook latex tools) […]

  2. […] docbook? tex4ht? […]

Leave a Reply

Your email address will not be published. Required fields are marked *