Checking DocBook documents for grammatical correctness.
- Purpose
- Download the reference grammar
- Checking with xmllint
- Checking with jing
- Download jing
- Write a script to operate jing comfortably
- Executing "sh docjing.sh myFile.xml"
will
yield
- A word of wisdom...
-
- The fact that a DocBook document is successfully displayed
via
XSLT
does not ensure that it conforms to the official standard.
For instance, the HelloDoc.xml
that we wrote in class looks
nice but is not correct.
XSLT rules are prone to apply also to ungrammatical constructs,
especially in DocBook where many grammar rules may seem to be overly
strict (see a typical example here).
- However, from the point of view of software development, it
is
essential to deliver documents that do conform to the standard.
Therefore, the validity of your files wrt the DocBook grammar is an
explicit requisite
of the XML project.
- Here are some indications to help you with actual grammatical
checking of your DocBook documents.
-
http://docs.oasis-open.org/docbook/rng/5.0/docbook.rng
(RelaxNG
in XML format - the compact format is not directly usable).
call it docbook.rng
and store it somewhere : /My/.../Path/to/docbook.rng
-
xmllint --noout --relaxng /My/.../Path/to/docbook.rng
myFile.xml
yields
myFile.xml validates
if myFile
is
indeed valid
- error messages that may be entirely misleading if the file is
not valid
Example : the HelloDoc.xml
file that we wrote in class
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="doc2xhtml.xsl"?>
<article xmlns="http://docbook.org/ns/docbook" version="5.0"
xmlns:xlink="http://www.w3.org/1999/xlink">
<info>
<title>Hello
World</title>
<author>
<personname>
<firstname>Jean-François</firstname>
<surname>Perrot</surname>
</personname>
</author>
</info>
<sect1><title>First</title>
<para><quote>This is a famous
saying</quote>
And this is regular text</para>
</sect1>
<sect1><title>Image</title>
<mediaobject>
<imageobject>
<imagedata
fileref="OASIS.png" />
</imageobject>
<caption>This is the OASIS
Logo</caption>
</mediaobject>
</sect1>
</article>
HelloDoc.xml:14: element sect1: Relax-NG validity error : Did not
expect element sect1 there
HelloDoc.xml:18: element sect1: Relax-NG validity error : Element
article has extra content: sect1
HelloDoc.xml fails to validate
The two first lines are completely false, only the final
message is correct.
Therefore, you are strongly advised to use jing!
-
-
http://code.google.com/p/jing-trang/
and store jing.jar
somewhere :
/The/.../Path/to/jing-20091111/bin/jing.jar
-
- suppose you call
it
docjing.sh
#checking DocBook files against the RNG
Grammar
JingJar=/The/.../Path/to/jing-20091111/bin/jing.jar
DocGram=/My/.../Path/to/docbook.rng
java -jar $JingJar $DocGram $1
-
- nothing if
myFile
is indeed valid
- a rather verbose but usable error message in the opposite
case :
jfp$ sh docjing.sh HelloDoc.xml
/Users/jfp/Sites/EPITA/International/Site2015b/Session5/DocBook/HelloDoc.xml:23:34:
error: text not allowed here;
expected element "address", "anchor", "annotation", "bibliolist",
"blockquote", "bridgehead", "calloutlist", "caution", "classsynopsis",
"cmdsynopsis", "constraintdef", "constructorsynopsis",
"destructorsynopsis", "epigraph", "equation", "example",
"fieldsynopsis", "figure", "formalpara", "funcsynopsis", "glosslist",
"important", "indexterm", "info", "informalequation",
"informalexample", "informalfigure", "informaltable", "itemizedlist",
"literallayout", "mediaobject", "methodsynopsis", "msgset", "note",
"orderedlist", "para", "procedure", "productionset", "programlisting",
"programlistingco", "qandaset", "remark", "revhistory", "screen",
"screenco", "screenshot", "segmentedlist", "sidebar", "simpara",
"simplelist", "synopsis", "table", "task", "tip", "variablelist" or
"warning"
/Users/jfp/Sites/EPITA/International/Site2015b/Session5/DocBook/HelloDoc.xml:23:44:
error: element "caption" incomplete;
expected element "address", "anchor", "annotation", "bibliolist",
"blockquote", "bridgehead", "calloutlist", "caution", "classsynopsis",
"cmdsynopsis", "constraintdef", "constructorsynopsis",
"destructorsynopsis", "epigraph", "equation", "example",
"fieldsynopsis", "figure", "formalpara", "funcsynopsis", "glosslist",
"important", "indexterm", "info", "informalequation",
"informalexample", "informalfigure", "informaltable", "itemizedlist",
"literallayout", "mediaobject", "methodsynopsis", "msgset", "note",
"orderedlist", "para", "procedure", "productionset", "programlisting",
"programlistingco", "qandaset", "remark", "revhistory", "screen",
"screenco", "screenshot", "segmentedlist", "sidebar", "simpara",
"simplelist", "synopsis", "table", "task", "tip", "variablelist" or
"warning"
jfp$
Note that line 23 of our text reads "<caption>This is the
OASIS Logo</caption>
".
From the double diagnosis "text not allowed here;
" and "element
"caption" incomplete;
" we deduce that a wrapper is needed (as is
very often the case),
for instance "<para>
".
Indeed, "<caption><para>This is the OASIS
Logo</para></caption>
" proves to be correct.
-
Do check your files at regular intervals while writing.
Do not wait until you get a mass of unreadable error messages!