XML-Schemas without Namespaces
Jean-François Perrot
- General ideas
- Shortcomings of
DTDs
- XML-Schemas are
the W3C's answer
- Types for XML trees
- Datatypes
- Simple types
- Complex types
- Three ways of
giving
a
type to an element or attribute
- Declaring
types as
separate entities (with their own names)
- Associating
an
anonymous type directly to an element (attribute)
- Reference
- Extending a schema : xsd:include / xsd:import
- Schema-based
Validation
- Validation
strategy
- Schema-based
validation with strategy A
- Schema-based
validation with strategy B
- Grammar vs. Type
System
-
-
- DTDs do not accommodate namespaces, i.e.
namespace prefixes are seen as integral parts of names.
- Limited capacity to specify constraints on strings (no
reg. exp. !)
-
to the need to improve on DTDs for specifying the structure of XML
documents.
The complexity of the system prompted the OASIS consortium to come up
with another proposal (Relax-NG), which is easier to use.
However, XML-Schemas remain the standard in a number of protocols,
notably Web Services.
-
written with XML syntax, with namespace :
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
.
see examples below
-
for elementary
data : strings, numbers, etc. e.g. xsd:string , xsd:int
, xsd:date, etc.
A (very sucessful) collection : http://www.w3.org/TR/xmlschema-2/
-
(
<xsd:simpleType>
)
for elements
with only text content.
The type describes the text content (string),
mainly by restrictions on a datatype.
-
(
<xsd:complexType>
)
for
everything else.
The structure of child-nodes is either sequence or choice.
N.B. Attributes are declared after the child-nodes.
-
-
and associating explicitly
the element or attribute with the type.
<xsd:complexType
name =
"myType">...</xsd:complexType>
<xsd:element name= "myElement" type = "myType" />
<xsd:simpleType
name =
"myAttrType">...</xsd:simpleType>
<xsd:attribute name= "myAttr" type = "myAttrType" />
Examples :
In this way elements (attributes) with different names can share the
same type.
In other words, types may be reused.
-
<xsd:element
name= "myElement">
<xsd:complexType>
...
</xsd:complexType>
</xsd:element>
<xsd:attribute
name= "myAttr">
<xsd:simpleType>
...
</xsd:simpleType>
</xsd:attribute>
Systematic use of this technique leads to the so-called russian doll
design.
No type sharing, no reuse.
Examples : Names &
Marks
#1 , Names &
Marks #2,
[candidate XML files #1, #2]
-
Instead of defining
a type, simply say that an element
(attribute) is of the same type (and name) as another,
by using a reference to it.
Examples :
-
Examples
N.B.
- Use
xsd:include
to bring in a schema from the same or no namespace.
- Use
xsd:import
to bring in a schema from a different namespace (see later).
-
-
There are two main approaches to validation : given a validator
(operator), and an object
(XML document to be validated),
who chooses the reference (or norm, or standard)
to be used for validation ?
- either the reference is chosen by the validator,
i.e. the validation process gets two arguments : validate the object
against the (explicitly chosen) reference
- or the reference is provided by the object,
and the validation process gets only one argument : validate the object
(against the implicitly given reference).
In particular, validation may be effected by the parser.
Clearly, both strategies reflect different attitudes towads validation.
Additionnally
- DTDs are meant to be used with strategy B
- An indication of the DTD is an integral part of the
specified XML document.
- For XML files, there is a specific syntactic
device
<!DOCTYPE...>
- For
org.w3c.dom.Document
objects,
the
DOM includes an interface for DTD objects, called org.w3c.dom.DocumentType
,
together with a method createDocumentType(...)
in the org.w3c.dom.DOMImplementation
interface.
Accordingly : xmllint --valid myFile.xml
See here
for a Java implemenation of DTD-based validation by the parser.
- You need a special tool like
xmllint
to
validate against a "foreign" DTD.
"xmllint myFile.xml --dtdvalid myGrammar.dtd
"
will use myGrammar.dtd
as reference,
even if myFile.xml
does contain a (pointer to) aDTD.
Actually, the "illogical" invocation "xmllint --valid
myFile.xml
--dtdvalid myGrammar.dtd
" will also use myGrammar.dtd
.
- On the contraray, XML Schemas are meant to be used with
strategy A (see
below).
From a historical perpective, there is clearly a shift in technology
from B to A.
- However, there is a specific syntax for attaching a
schema to a file, so as to use strategy B as well.
This feature is used by Eclipse.
Note that this is no longer possible for the third validation framework
: RelaxNG.
-
- Validation
with
xmllint
%xmllint --noout
myFile.xml --schema mySchema.xsd
- Validation
with
javax.xml.validation
- The basic mechanism is set up in class
SchemaValidate.
Note that the action of the Validator
object is empty if
the validation succeeds, and to raise an exception if it fails.
As a consequence, failed validation stops the process.
To fullly appreciate this feature, compare with PHP's handling of the
same problem, where DOMDocument::schemaValidate
returns a
boolean value, does not stop the process, and leaves error-handling to
the programmer : Average_1V.php
(see below)
- Here is a simple illustration of the "natural" use of
this
technique for checking the validity of a document prior to using it.
We apply it to our first example Average_1
,
computing the average mark of a Names
& Marks, model 1 document :
see Average_1V
.
Note that we check on the parsed Document
(in order to
avoid duplication of disk access), by means of a DOMSource
.
-
- Attaching a schema
by means of a specific attribute of the root tag, belonging to another
namespace.
<?xml version="1.0" ?>
<myRootTag xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="URL-or-Path-pointing-to/mySchema.xsd"
>
- Java implementation
- Eclipse
-
Two points of view on essentially the same contents.
- A Grammar specifies how entities are composed, in
a
top-down fashion.
- In Computer Science, the word grammar
ususally refers to context-free grammars, used to describe the
concrete syntax of programming languages.
Such grammars describe the structure of character strings (programs).
- DTDs are tree-grammars, describing the
structure of
XML trees.
A tree like this one
will be deemed correct wrt this DTD if
- its root tag conforms to the grammar rule
<!ELEMENT Car (Body, Engine,
Transmission)>
<!ATTLIST Car make CDATA #REQUIRED>
<!ATTLIST Car model CDATA #REQUIRED>
- the 3 child nodes of the root tag all conform to
the respective rules
<!ELEMENT Body (Hood)>
<!ATTLIST Body color CDATA #REQUIRED>
<!ELEMENT Engine (Cylinders, Ignition)>
<!ELEMENT Cylinders EMPTY>
<!ELEMENT Ignition (#PCDATA)>
<!ELEMENT Transmission (GearBox, FrontAxle, RearAxle)>
<!ATTLIST Transmission type (automatic | manual) #REQUIRED>
<!ATTLIST Transmission gear_nb (3 | 4 | 5) #REQUIRED>
- the 6 grandchild nodes of the root also conform
to the respective rules
<!ELEMENT Hood (#PCDATA)>
<!ELEMENT Cylinders EMPTY>
<!ELEMENT Ignition (#PCDATA)>
<!ELEMENT GearBox EMPTY>
<!ELEMENT FrontAxle EMPTY>
<!ELEMENT RearAxle EMPTY>
- The idea of a Type System comes from programming
languages :
given a program, the aim is to attach to each construct of the program
a qualification (called a type) in a bottom-up fashion.
The type-checking process is conducted according to typing rules.
The program will be correct if the type-checking process succeeds in
assigning a type to the whole program.
An XML Schema is a type system.
For instance the same XML tree will be checked by defining a type for
each subtree and for each attribute :
-