XML Schemas and Namespaces
Jean-François Perrot
- Basic Orientation
- targetNamespace
- elementFormDefault
- Definition vs. Usage
- Definition
- Usage
- Consequence for XML
documents using several namespaces
- The main
schema.
- Defining epita:mark
- Defining xml:base
- Qualified / Unqualified
- Using unqualified attributes
- Global / local definitions
- Returning to our example
- Validation and namespace-awareness
(Java)
- Validation of an
XML document read from a file
- Namespace
awareness
- A technical
detail...
- Not so
technical, after all...
-
The creators of XMLS adopted the view that
- a schema should define (the rules governing the use of) a
vocabulary
- the whole of this vocabulary should belong to a single
namespace.
-
Accordingly, this unique namespace is indicated by the
targetNamespace
attribute of the xsd:schema
root element.
Example : we put our familiar Names & Marks example into
our favorite epita
namespace (http://epita/masters/international/
)
either
- as a default namespace :
NMark1.ds.xml
- with prefix :
NMark1.ns.xml
(where it appears that, following common usage, ontly the tags bear the
prefix, not the attributes)
and we turn any of the schemas that we built previously for this type
of document, eg. the russian doll one NM1-1.xsd
into an epita-oriented one
NM1-1.ns.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace =
"http://epita/masters/international/"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xsd:element name="list">
<xsd:complexType>
.........no change from the previous version.......
</xsd:complexType>
</xsd:element>
</xsd:schema>
-
Note the presence of additional information about the status
of tags (elements) and attribute names wrt the namespace :
- tag names explicitly belong to the namespace, i.e.
they must bear a prefix (unless the namespace is default)
---> elementFormDefault="qualified"
- attribute names must not bear a prefix
---> attributeFormDefault="unqualified"
however, this second piece of info may be left implicit.
This is a complicated matter, we shall come back to it later ...
-
Also note that in our schema document the epita namespace URL appears
only as the value of the
targetNamespace
attribute,
not as a namespace for the document (i.e. as xmlns
...).
Things would have been different if we had chosen to use another form
of the schema, such as NM1-2.xsd
.
The version "with namespace" is NM1-2.ns.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace =
"http://epita/masters/international/"
elementFormDefault="qualified"
attributeFormDefault="unqualified"
xmlns:epita="http://epita/masters/international/">
<!-- Explicit types -->
<xsd:element name="list" type="epita:listType"/>
<xsd:complexType name="listType">
<xsd:choice>
<xsd:element name="student"
type="epita:studType" minOccurs="0" maxOccurs="unbounded"/>
</xsd:choice>
</xsd:complexType>
<xsd:simpleType name="markOver20">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive
value="0"/>
<xsd:maxInclusive
value="20"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="studType">
<xsd:attribute name="name" type="xsd:string"/>
<xsd:attribute name="mark" type="epita:markOver20"/>
</xsd:complexType>
</xsd:schema>
This is because the presence of a namespace reveals the profound
difference between defining a name and using an already
defined one in XMLS.
-
occurs with the
name
attribute : <xsd:element
name="liste"...>
<xsd:attribute name="nom"...>
<xsd:complexType
name="typEleve"...
the value of this attribute must be an 'NCName
', i.e. a "non-colonized"
name from which colons (and a few other characters) are excluded.
This is natural enough since such a name automatically belongs to the target
namespace : no need to qualify it !
In our russian doll schema, all names that occurrd were either
being defined or belonging to the xsd
namespace,
so that there was no need to introduce the epita
namespace...
-
occurs when a name is used to indicate a value, either
previously known (e.g.
"xsd:integer"
or "xsd:integer"
),
or defined somewhere in the same document. Such a name must be
qualified !
This is the case here for the 3 types that are defined and used to
further define elements and attributes.
The situation would have been the same if we used references instead of
types, as in NM2-4.xsd
.
If the schema uses names that are first defined in itself, then these
names automatically belong to the target namespace,
in order to qualify them the same URL must be appear in a namespace
definition "xmlns..
.".
However, an explicit prefix (as in our example) is not necessary : a
default namespace will do, e.g. NM1-2.ds.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace =
"http://epita/masters/international/"
elementFormDefault="qualified"
attributeFormDefault="unqualified"
xmlns="http://epita/masters/international/">
<!-- Explicit types -->
<xsd:element name="list" type="listType"/>
..... no change from the namespace-less version .....
</xsd:schema>
-
To describe such a document, the schema must be a
combination of k
different schemas, where k is the number of namespaces.
The specific tool for such a construction is
<xsd:import schemaLocation="mySchema.xsd"
namespace="myNamespaceURI" />
Of course the imported namespaces will have to be also defined as xmlns
in the root tag.
See a well-presented example in the XML Schema Tutorial, Part 4.
Here is such a construction for our Names & Marks in RDF form.
Note that there are 3 namespaces involved :
rdf,
i.e. "http://www.w3.org/1999/02/22-rdf-syntax-ns#
"
;
epita
, "http://epita/masters/international/
"
;
- and
xml
because of the xml:base
attribute !
Recall that this namespace prefix is supposed to be implicitly defined
in every
XML document,
the corresponding URL being http://www.w3.org/XML/1998/namespace
.
-
Since it defines the root tag of the document, its target namespace
must be
rdf,
i.e. "http://www.w3.org/1999/02/22-rdf-syntax-ns#
".
Note that attributeFormDefault
will be "qualified"
.
The epita
and xml
namespaces will be used
for imported names.
NNrdf.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
elementFormDefault="qualified"
attributeFormDefault="qualified"
xmlns:epita="http://epita/masters/international/">
<!-- xml namspace
"http://www.w3.org/XML/1998/namespace" is default-defined -->
<xsd:import schemaLocation="Mark.xsd"
namespace="http://epita/masters/international/" />
<xsd:import schemaLocation="Base.xsd"
namespace="http://www.w3.org/XML/1998/namespace" />
<-- Russian Doll Design -->
<xsd:element name="RDF">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Description" minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element
ref="epita:mark" />
</xsd:sequence>
<xsd:attribute name="about">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:pattern
value="#\p{Lu}\p{Ll}+(-
\p{Lu}
\p{Ll}
+)"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute
ref="xml:base" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:schema>
-
We need
epita
as target namespace, and elementFormDefault="qualified"
.
Mark.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace =
"http://epita/masters/international/"
elementFormDefault="qualified">
<xsd:element name="mark">
<xsd:simpleType>
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="0"/>
<xsd:maxInclusive value="20"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:schema>
-
We need
xml
as target namespace, and attributeFormDefault="qualified"
.
Base.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace =
"http://www.w3.org/XML/1998/namespace"
attributeFormDefault="qualified">
<xsd:attribute name="base" type="xsd:string" />
</xsd:schema>
-
-
is the most frequent situation. Indeed,
whereas the
elementFormDefault="qualified"
declaration is
usual,
the corresponding attributeFormDefault="unqualified"
is the default option and often omitted.
We have seen two such schemas earlier : NM1-1.ns.xsd
and NM1-2.ns.xsd
(or NM1-2.ds.xsd
),
without any particular feature.
However, things are more complex when unqualified attributes are
defined by xsd elements that are direct children of the <xsd:schema...
>
root tag
(so-called top-level definitions, or global definitions).
Take for example Car-2.xsd
:
the top-level tags of this schema are :
<xsd:element name="Car">
<xsd:element name="Body">
<xsd:element name="Engine">
<xsd:element name="Transmission">
<xsd:attribute name="gear_nb">
<xsd:attribute name="type">
of which the two last ones define attributes, not elements.
If these two attributes are to be kept unqualified, how do we express
in the schema that they do not bear any namespace?
Suppose that we proceed as we did in NM1-2.ds.xsd
,
i.e. with a default namespace:
Car-2NUb.xsd
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace =
"http://epita/masters/international/"
elementFormDefault="qualified"
attributeFormDefault="unqualified"
xmlns="http://epita/masters/international/">
<xsd:element name="Car">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="Body" />
<xsd:element ref="Engine" />
<xsd:element ref="Transmission" />
</xsd:sequence>
<xsd:attribute name="make"
type="xsd:string"/>
<xsd:attribute name="model"
type="xsd:string"/>
</xsd:complexType>
</xsd:element>
..... same for the 3 other elements....
<xsd:attribute name="gear_nb">
<xsd:annotation>
<xsd:documentation>Including reverse</xsd:documentation>
</xsd:annotation>
<xsd:simpleType>
<xsd:restriction base="xsd:int">
<xsd:minInclusive value="4"/>
<xsd:maxInclusive value="6"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name="type">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="automatic"/>
<xsd:enumeration value="manual"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:schema>
When we use to validate e.g. CarNU.xml
, we get an error message on line <Transmission
type="automatic" gear_nb="5">
Multiple annotations found at this line:
- cvc-complex-type.3.2.2: Attribute 'type' is not
allowed to appear in element 'Transmission'.
- cvc-complex-type.3.2.2: Attribute 'gear_nb' is not
allowed to appear in element 'Transmission'.
Howszat?
-
See Massimo Franceschet
(Univ. Udine, Italy) and Dare Obasanjo (Microsoft) for other ways to tell
this complicated story.
xsd elements that are direct children of the <xsd:schema...
>
root tag are called top-level definitions, or global
definitions.
The other ones are local.
With regard to namespaces, they follow different rules :
- by default
- global declarations validate elements or attributes with
a namespace name,
- while local declarations validate elements or attributes
without a namespace name.
- the default behavior of local declarations can be overridden by 2 mechanisms
elementFormDefault
and attributeFormDefault
attributes of <xsd:schema>
, with values qualified
and unqualified
.
They specify whether local declarations in the schema should validate namespace qualified elements and attributes respectively.
- The
form
attribute on local element and attribute declarations, aloso with values qualified
and unqualified
can be used to override at the individual level the general decision taken by elementFormDefault
and attributeFormDefault
.
- there is no way to override the default policy for global declarations.
-
we see that our global attribute definitions must fit instances with a namespace...
so that there is a contradiction between our desire expressed by attributeFormDefault="unqualified"
and our decision to define attributes at the global level in the schema.
More generally, one should formulate this as a "best practice" rule :
Never define globally attributes that you want to keep unqualified.
Should we abandon our design ?
There is a way out, whici is to treat "unqualified" as meaning "belonging to no-namespace",
and to consider our schema as referring to 2 namespaces :
"http://epita/masters/international/"
for our elements
- and "no-namespace" for our "top-level defined" attributes.
Following the process exemplified in sect. 3, we write a "no-namespace" schema
Attr.xsd
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:attribute name="gear_nb">
<xsd:annotation>
<xsd:documentation>Including reverse</xsd:documentation>
</xsd:annotation>
<xsd:simpleType>
<xsd:restriction base="xsd:int">
<xsd:minInclusive value="4"/>
<xsd:maxInclusive value="6"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name="type">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="automatic"/>
<xsd:enumeration value="manual"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:schema>
which we import into the "main" schema
Car-2NU.xsd
(note that, in order to keep our globally defined attributes - when they are imported - out of the targetNamespace,
a namespace prefix must be used in the schema! )
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace = "http://epita/masters/international/"
elementFormDefault="qualified"
attributeFormDefault="unqualified"
xmlns:e="http://epita/masters/international/" >
<xsd:import schemaLocation="Attr.xsd" /> <! without "namespace", of course ! -->
<xsd:element name="Car">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="e:Body" />
<xsd:element ref="e:Engine" />
<xsd:element ref="e:Transmission" />
</xsd:sequence>
<xsd:attribute name="make" type="xsd:string"/>
<xsd:attribute name="model" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
.....
<!-- attributes gear_nb and type are imported -->
</xsd:schema>
Exercise : rewrite schema Car-2NUb.xsd
with a prefix namespace (instead of the present default namespace),
and carefully observe what Eclipse has to say about it.
-
-
(i.e. by
validator.validate (new StreamSource (new File(fileName));
as in the VerifSchema
example)
works equally with or without namespace.
Just try !
-
Validation of a
Document
object, either
previously parsed from file or built with DOM,
(i.e. by validator.validate (new DOMSource (myDocument));
as in Average_1V
)
requires that the said objet be namespace aware (see Namespaces
in DOM-Java).
Illustration :
- without namespace-awareness (
Average_1V
as is), mysterious failure of a perfectly valid
file...
jfp$ java Average_1V Nom_note1.ds.xml Nom_note1-2.ns.xsd
Invalid file : cvc-elt.1 : Déclaration de l'élément 'liste' introuvable.
- with namespace-awareness (
Average_1VA =
Average_1V
corrected along the lines of Documents
created...), all right
jfp$ java Average_1VA Nom_note1.ds.xml
Nom_note1-2.ns.xsd
Toto's mark is 12
Tata's mark is 13
Tutu's mark is 17
Tutu's mark is 11
Average is : 13.0
- but if we try with a file which is supposed to be
equivalent - and the same schema - we get a very different result :
jfp$ java Average_1VA Nom_note1.ns.xml
Nom_note1-2.ns.xsd
Exception in thread "main" java.lang.Exception: Empty contents
at Average_1VA.average_1(Average_1VA.java:45)
at Average_1VA.main(Average_1VA.java:87)
????????????
-
That's because we forgot the warning issued in DOM
manipulation of namespace aware documents, to use
getElementsByTagNameNS(String namespaceURI,
String localName)
instead of getElementsByTagName(String name)
Now, where do we find the namespaceURI
?
The JavaDoc for the Node interface has a bizarre
comment about the getNamsepaceURI()
method :
This is not a computed value that is the result of a
namespace lookup based on an examination of the namespace declarations
in scope. It is merely the namespace URI given at creation time.
For nodes of any type other than ELEMENT_NODE
and ATTRIBUTE_NODE
and nodes created with a DOM Level 1 method, such as Document.createElement()
,
this is always null.
However, it works !
We modify the first lines of the average_1
method as
follows (Average_1VAN
)
:
public static float
average_1(Document doc) throws Exception {
Element start =
doc.getDocumentElement();
String ns =
start.getNamespaceURI();
if( ns == null ) throw new
Exception("no namespace");
int k = 0; //number of
students/marks
int s = 0; //total
NodeList students =
start.getElementsByTagNameNS(ns, "student");
......... the rest of the
code unchanged .........
and finally
jfp$ java Average_1VAN Nom_note1.ns.xml
Nom_note1-2.ns.xsd
Toto's mark is 12
Tata's mark is 13
Tutu's mark is 17
Tutu's mark is 11
Average is : 13.0
jfp$ java Average_1VAN Nom_note1.ds.xml
Nom_note1-2.ns.xsd
Toto's mark is 12
Tata's mark is 13
Tutu's mark is 17
Tutu's mark is 11
Average is : 13.0
OK, this time our class works for both equivalent files (with prefix,
with default), as was (of course) expected.
That Average-1VA
should work for one and not for the
other is a clear indication that JAXP has some problems with namespaces.
PHP is less erratic : the same Average_1V.php
works also with one namespace (with prefix as well as with
default).
But of course, if several namespaces are involved, use of getElementsByTagNameNS
and Co. is required !
-
Indeed, with more than one namespace, this heavy machinery is clearly
justified.
Let's see what happens to the Average_R
class (for Names & Marks in RDF form) if we want validation.
What changes from Average_R
is
public static float
average_rdf(Document doc, PrintWriter sortie, String rdfURI, String
epitaURI) throws Exception {
Element start =
doc.getDocumentElement();
int k = 0; //number of
students/marks
int s = 0; //total
NodeList eleves = start.getElementsByTagNameNS(rdfURI,
"Description");
for( int i = 0; i <
eleves.getLength(); i++ ){
Element eleve
= (Element) eleves.item(i); // explicit cast !
String name =
eleve.getAttributeNS(rdfURI, "about").substring(1); // drop the
'#' sign
Element note =
(Element) eleve.getElementsByTagNameNS(epitaURI, "mark").item(0);
.......... unchanged
.............
}// average_rdf
the rest is taken from Average_1VAN
.
Whole code Average_RV
.
The same in DOM-PHP : Average_RV.php
Beware of namespaces !