Checking DocBook documents for grammatical correctness.

Jean-François Perrot

Purpose
Download the reference grammar
Checking with xmllint
Checking with jing

Purpose
- The fact that a DocBook document is successfully displayed via XSLT does not ensure that it conforms to the official standard.
  For instance, the HelloDoc.xml that we wrote in class looks nice but is not correct.
  XSLT rules are prone to apply also to ungrammatical constructs, especially in DocBook where many grammar rules may seem to be overly strict (see a typical example here).
- However, from the point of view of software development, it is essential to deliver documents that do conform to the standard.
  Therefore, the validity of your files wrt the DocBook grammar is an explicit requisite of the XML project.
- Here are some indications to help you with actual grammatical checking of your DocBook documents.
Download the reference grammar
http://docs.oasis-open.org/docbook/rng/5.0/docbook.rng(RelaxNG in XML format - the compact format is not directly usable).

call it docbook.rng and store it somewhere : /My/.../Path/to/docbook.rng
Checking with xmllint
xmllint --noout --relaxng /My/.../Path/to/docbook.rng myFile.xml
yields
- myFile.xml validates if myFile is indeed valid
- error messages that may be entirely misleading if the file is not valid
  
  Example : the HelloDoc.xml file that we wrote in class
  
  <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="doc2xhtml.xsl"?> <article xmlns="http://docbook.org/ns/docbook" version="5.0" xmlns:xlink="http://www.w3.org/1999/xlink"> <info> <title>Hello World</title> <author> <personname> <firstname>Jean-François</firstname> <surname>Perrot</surname> </personname> </author> </info> <sect1><title>First</title> <para><quote>This is a famous saying</quote> And this is regular text</para> </sect1> <sect1><title>Image</title> <mediaobject> <imageobject> <imagedata fileref="OASIS.png" /> </imageobject> <caption>This is the OASIS Logo</caption> </mediaobject> </sect1> </article>
  
  HelloDoc.xml:14: element sect1: Relax-NG validity error : Did not expect element sect1 there HelloDoc.xml:18: element sect1: Relax-NG validity error : Element article has extra content: sect1 HelloDoc.xml fails to validate
  
  The two first lines are completely false, only the final message is correct.
  Therefore, you are strongly advised to use jing!
Checking with jing
1. Download jing
  http://code.google.com/p/jing-trang/
  and store jing.jar somewhere :/The/.../Path/to/jing-20091111/bin/jing.jar
2. Write a script to operate jing comfortably
  - suppose you call it docjing.sh
  
  #checking DocBook files against the RNG Grammar JingJar=/The/.../Path/to/jing-20091111/bin/jing.jar DocGram=/My/.../Path/to/docbook.rng java -jar $JingJar $DocGram $1
3. Executing "sh docjing.sh myFile.xml" will yield
  - nothing if myFile is indeed valid
  - a rather verbose but usable error message in the opposite case :
    
    jfp$ sh docjing.sh HelloDoc.xml /Users/jfp/Sites/EPITA/International/Site2015b/Session5/DocBook/HelloDoc.xml:23:34: error: text not allowed here; expected element "address", "anchor", "annotation", "bibliolist", "blockquote", "bridgehead", "calloutlist", "caution", "classsynopsis", "cmdsynopsis", "constraintdef", "constructorsynopsis", "destructorsynopsis", "epigraph", "equation", "example", "fieldsynopsis", "figure", "formalpara", "funcsynopsis", "glosslist", "important", "indexterm", "info", "informalequation", "informalexample", "informalfigure", "informaltable", "itemizedlist", "literallayout", "mediaobject", "methodsynopsis", "msgset", "note", "orderedlist", "para", "procedure", "productionset", "programlisting", "programlistingco", "qandaset", "remark", "revhistory", "screen", "screenco", "screenshot", "segmentedlist", "sidebar", "simpara", "simplelist", "synopsis", "table", "task", "tip", "variablelist" or "warning" /Users/jfp/Sites/EPITA/International/Site2015b/Session5/DocBook/HelloDoc.xml:23:44: error: element "caption" incomplete; expected element "address", "anchor", "annotation", "bibliolist", "blockquote", "bridgehead", "calloutlist", "caution", "classsynopsis", "cmdsynopsis", "constraintdef", "constructorsynopsis", "destructorsynopsis", "epigraph", "equation", "example", "fieldsynopsis", "figure", "formalpara", "funcsynopsis", "glosslist", "important", "indexterm", "info", "informalequation", "informalexample", "informalfigure", "informaltable", "itemizedlist", "literallayout", "mediaobject", "methodsynopsis", "msgset", "note", "orderedlist", "para", "procedure", "productionset", "programlisting", "programlistingco", "qandaset", "remark", "revhistory", "screen", "screenco", "screenshot", "segmentedlist", "sidebar", "simpara", "simplelist", "synopsis", "table", "task", "tip", "variablelist" or "warning" jfp$
    
    Note that line 23 of our text reads "<caption>This is the OASIS Logo</caption>".
    From the double diagnosis "text not allowed here;" and "element "caption" incomplete;" we deduce that a wrapper is needed (as is very often the case),
    for instance "<para>".
    Indeed, "<caption><para>This is the OASIS Logo</para></caption>" proves to be correct.
4. A word of wisdom...
  Do check your files at regular intervals while writing.
  Do not wait until you get a mass of unreadable error messages!

Checking DocBook documents for grammatical correctness.

Jean-François Perrot

Purpose

Download the reference grammar

Checking with xmllint

Checking with jing

Download jing

Write a script to operate jing comfortably

Executing "`sh docjing.sh myFile.xml`" will yield

A word of wisdom...