XML Schemas and Namespaces

Jean-François Perrot

Basic Orientation
1. targetNamespace
2. elementFormDefault
Definition vs. Usage
1. Definition
2. Usage
Consequence for XML documents using several namespaces
Qualified / Unqualified
Validation and namespace-awareness (Java)

Basic Orientation
The creators of XMLS adopted the view that
- a schema should define (the rules governing the use of) a vocabulary
- the whole of this vocabulary should belong to a single namespace.
1. targetNamespace
  Accordingly, this unique namespace is indicated by the targetNamespace attribute of the xsd:schema root element.
  Example : we put our familiar Names & Marks example into our favorite epita namespace (http://epita/masters/international/)
  either
  - as a default namespace : NMark1.ds.xml
  - with prefix : NMark1.ns.xml
    (where it appears that, following common usage, ontly the tags bear the prefix, not the attributes)
  and we turn any of the schemas that we built previously for this type of document, eg. the russian doll one NM1-1.xsd into an epita-oriented one
  NM1-1.ns.xsd
  
  <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace = "http://epita/masters/international/" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xsd:element name="list"> <xsd:complexType> .........no change from the previous version....... </xsd:complexType> </xsd:element> </xsd:schema>
2. elementFormDefault
  Note the presence of additional information about the status of tags (elements) and attribute names wrt the namespace :
  - tag names explicitly belong to the namespace, i.e. they must bear a prefix (unless the namespace is default)
    ---> elementFormDefault="qualified"
  - attribute names must not bear a prefix
    ---> attributeFormDefault="unqualified"
    however, this second piece of info may be left implicit.
  This is a complicated matter, we shall come back to it later ...
Definition vs. Usage
Also note that in our schema document the epita namespace URL appears only as the value of the targetNamespace attribute,
not as a namespace for the document (i.e. as xmlns...).
Things would have been different if we had chosen to use another form of the schema, such as NM1-2.xsd.
The version "with namespace" is NM1-2.ns.xsd

<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace = "http://epita/masters/international/" elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:epita="http://epita/masters/international/">  <xsd:element name="list" type="epita:listType"/> <xsd:complexType name="listType"> <xsd:choice> <xsd:element name="student" type="epita:studType" minOccurs="0" maxOccurs="unbounded"/> </xsd:choice> </xsd:complexType> <xsd:simpleType name="markOver20"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="0"/> <xsd:maxInclusive value="20"/> </xsd:restriction> </xsd:simpleType> <xsd:complexType name="studType"> <xsd:attribute name="name" type="xsd:string"/> <xsd:attribute name="mark" type="epita:markOver20"/> </xsd:complexType> </xsd:schema>

This is because the presence of a namespace reveals the profound difference between defining a name and using an already defined one in XMLS.
1. Definition
  occurs with the name attribute : <xsd:element name="liste"...><xsd:attribute name="nom"...> <xsd:complexType name="typEleve"...
  the value of this attribute must be an 'NCName', i.e. a "non-colonized" name from which colons (and a few other characters) are excluded.
  This is natural enough since such a name automatically belongs to the target namespace : no need to qualify it !
  
  In our russian doll schema, all names that occurrd were either being defined or belonging to the xsd namespace,
  so that there was no need to introduce the epita namespace...
2. Usage
  occurs when a name is used to indicate a value, either previously known (e.g. "xsd:integer" or "xsd:integer"),
  or defined somewhere in the same document. Such a name must be qualified !
  This is the case here for the 3 types that are defined and used to further define elements and attributes.
  The situation would have been the same if we used references instead of types, as in NM2-4.xsd.
  
  If the schema uses names that are first defined in itself, then these names automatically belong to the target namespace,
  in order to qualify them the same URL must be appear in a namespace definition "xmlns...".
  However, an explicit prefix (as in our example) is not necessary : a default namespace will do, e.g. NM1-2.ds.xsd
  
  <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace = "http://epita/masters/international/" elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns="http://epita/masters/international/">  <xsd:element name="list" type="listType"/> ..... no change from the namespace-less version .....</xsd:schema>
Consequence for XML documents using several namespaces
To describe such a document, the schema must be a combination of k different schemas, where k is the number of namespaces.
The specific tool for such a construction is
<xsd:import schemaLocation="mySchema.xsd" namespace="myNamespaceURI" />

Of course the imported namespaces will have to be also defined as xmlns in the root tag.

See a well-presented example in the XML Schema Tutorial, Part 4.

Here is such a construction for our Names & Marks in RDF form.
Note that there are 3 namespaces involved :
- rdf, i.e. "http://www.w3.org/1999/02/22-rdf-syntax-ns#" ;
- epita, "http://epita/masters/international/" ;
- and xml because of the xml:base attribute !
  Recall that this namespace prefix is supposed to be implicitly defined in every XML document,
  the corresponding URL being http://www.w3.org/XML/1998/namespace.
1. The main schema.
  Since it defines the root tag of the document, its target namespace must be rdf, i.e. "http://www.w3.org/1999/02/22-rdf-syntax-ns#".
  Note that attributeFormDefault will be "qualified".
  The epita and xml namespaces will be used for imported names.
  
  NNrdf.xsd
  
  <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/1999/02/22-rdf-syntax-ns#" elementFormDefault="qualified" attributeFormDefault="qualified" xmlns:epita="http://epita/masters/international/">  <xsd:import schemaLocation="Mark.xsd" namespace="http://epita/masters/international/" /> <xsd:import schemaLocation="Base.xsd" namespace="http://www.w3.org/XML/1998/namespace" /> <-- Russian Doll Design --> <xsd:element name="RDF"> <xsd:complexType> <xsd:sequence> <xsd:element name="Description" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element ref="epita:mark" /> </xsd:sequence> <xsd:attribute name="about"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="#\p{Lu}\p{Ll}+(-\p{Lu}\p{Ll}+)"/> </xsd:restriction> </xsd:simpleType> </xsd:attribute> </xsd:complexType> </xsd:element> </xsd:sequence> <xsd:attribute ref="xml:base" use="required"/> </xsd:complexType> </xsd:element> </xsd:schema>
2. Defining epita:mark
  We need epita as target namespace, and elementFormDefault="qualified".
  Mark.xsd
  
  <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace = "http://epita/masters/international/" elementFormDefault="qualified"> <xsd:element name="mark"> <xsd:simpleType> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="0"/> <xsd:maxInclusive value="20"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:schema>
3. Defining xml:base
  We need xml as target namespace, and attributeFormDefault="qualified".
  Base.xsd
  
  <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace = "http://www.w3.org/XML/1998/namespace" attributeFormDefault="qualified"> <xsd:attribute name="base" type="xsd:string" /> </xsd:schema>
Qualified / Unqualified
1. Using unqualified attributes
  is the most frequent situation. Indeed, whereas the elementFormDefault="qualified" declaration is usual,the corresponding attributeFormDefault="unqualified" is the default option and often omitted.
  We have seen two such schemas earlier : NM1-1.ns.xsd and NM1-2.ns.xsd (or NM1-2.ds.xsd), without any particular feature.
  
  However, things are more complex when unqualified attributes are defined by xsd elements that are direct children of the <xsd:schema... > root tag
  (so-called top-level definitions, or global definitions). Take for example Car-2.xsd :
  the top-level tags of this schema are :
  1. <xsd:element name="Car">
  2. <xsd:element name="Body">
  3. <xsd:element name="Engine">
  4. <xsd:element name="Transmission">
  5. <xsd:attribute name="gear_nb">
  6. <xsd:attribute name="type">
  of which the two last ones define attributes, not elements.
  
  If these two attributes are to be kept unqualified, how do we express in the schema that they do not bear any namespace?
  Suppose that we proceed as we did in NM1-2.ds.xsd, i.e. with a default namespace:
  Car-2NUb.xsd
  
  <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"     targetNamespace = "http://epita/masters/international/"     elementFormDefault="qualified"     attributeFormDefault="unqualified"     xmlns="http://epita/masters/international/">         <xsd:element name="Car">     <xsd:complexType>         <xsd:sequence>             <xsd:element ref="Body" />             <xsd:element ref="Engine" />             <xsd:element ref="Transmission" />         </xsd:sequence>         <xsd:attribute name="make" type="xsd:string"/>         <xsd:attribute name="model" type="xsd:string"/>     </xsd:complexType>     </xsd:element> ..... same for the 3 other elements....     <xsd:attribute name="gear_nb">     <xsd:annotation>         <xsd:documentation>Including reverse</xsd:documentation>     </xsd:annotation>         <xsd:simpleType>             <xsd:restriction base="xsd:int">                 <xsd:minInclusive value="4"/>                 <xsd:maxInclusive value="6"/>             </xsd:restriction>         </xsd:simpleType>     </xsd:attribute>         <xsd:attribute name="type">         <xsd:simpleType>             <xsd:restriction base="xsd:string">                 <xsd:enumeration value="automatic"/>                 <xsd:enumeration value="manual"/>             </xsd:restriction>         </xsd:simpleType>     </xsd:attribute>     </xsd:schema>
  
  When we use to validate e.g. CarNU.xml, we get an error message on line <Transmission type="automatic" gear_nb="5">
  Multiple annotations found at this line:
      - cvc-complex-type.3.2.2: Attribute 'type' is not allowed to appear in element 'Transmission'.
      - cvc-complex-type.3.2.2: Attribute 'gear_nb' is not allowed to appear in element 'Transmission'.
  
  Howszat?
2. Global / local definitions
  See Massimo Franceschet (Univ. Udine, Italy) and Dare Obasanjo (Microsoft) for other ways to tell this complicated story.
  
  xsd elements that are direct children of the <xsd:schema... > root tag are called top-level definitions, or global definitions.
  The other ones are local.
  With regard to namespaces, they follow different rules :
  - by default
    - global declarations validate elements or attributes with a namespace name,
    - while local declarations validate elements or attributes without a namespace name.
  - the default behavior of local declarations can be overridden by 2 mechanisms
    - elementFormDefault and attributeFormDefault attributes of <xsd:schema>, with values qualified and unqualified.
      They specify whether local declarations in the schema should validate namespace qualified elements and attributes respectively.
    - The form attribute on local element and attribute declarations, aloso with values qualified and unqualified
      can be used to override at the individual level the general decision taken by elementFormDefault and attributeFormDefault.
  - there is no way to override the default policy for global declarations.
3. Returning to our example
  we see that our global attribute definitions must fit instances with a namespace...
  so that there is a contradiction between our desire expressed by attributeFormDefault="unqualified"
  and our decision to define attributes at the global level in the schema.
  More generally, one should formulate this as a "best practice" rule :
  Never define globally attributes that you want to keep unqualified.
  
  Should we abandon our design ?
  There is a way out, whici is to treat "unqualified" as meaning "belonging to no-namespace",
  and to consider our schema as referring to 2 namespaces :
  - "http://epita/masters/international/" for our elements
  - and "no-namespace" for our "top-level defined" attributes.
  Following the process exemplified in sect. 3, we write a "no-namespace" schema
  Attr.xsd
  
  <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:attribute name="gear_nb"> <xsd:annotation> <xsd:documentation>Including reverse</xsd:documentation> </xsd:annotation> <xsd:simpleType> <xsd:restriction base="xsd:int"> <xsd:minInclusive value="4"/> <xsd:maxInclusive value="6"/> </xsd:restriction> </xsd:simpleType> </xsd:attribute> <xsd:attribute name="type"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="automatic"/> <xsd:enumeration value="manual"/> </xsd:restriction> </xsd:simpleType> </xsd:attribute> </xsd:schema>
  
  which we import into the "main" schema
  Car-2NU.xsd
  (note that, in order to keep our globally defined attributes - when they are imported - out of the targetNamespace,
  a namespace prefix must be used in the schema! )
  
  <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace = "http://epita/masters/international/" elementFormDefault="qualified" attributeFormDefault="unqualified" xmlns:e="http://epita/masters/international/" > <xsd:import schemaLocation="Attr.xsd" /> <! without "namespace", of course ! --> <xsd:element name="Car"> <xsd:complexType> <xsd:sequence> <xsd:element ref="e:Body" /> <xsd:element ref="e:Engine" /> <xsd:element ref="e:Transmission" /> </xsd:sequence> <xsd:attribute name="make" type="xsd:string"/> <xsd:attribute name="model" type="xsd:string"/> </xsd:complexType> </xsd:element> .....  </xsd:schema>
  
  Exercise : rewrite schema Car-2NUb.xsd with a prefix namespace (instead of the present default namespace),
  and carefully observe what Eclipse has to say about it.
Validation and namespace-awareness (Java)
1. Validation of an XML document read from a file
  (i.e. by validator.validate (new StreamSource (new File(fileName)); as in the VerifSchema example)
  works equally with or without namespace.
  Just try !
2. Namespace awareness
  Validation of a Document object, either previously parsed from file or built with DOM,
  (i.e. by validator.validate (new DOMSource (myDocument)); as in Average_1V )
  requires that the said objet be namespace aware (see Namespaces in DOM-Java).
  
  Illustration :
  - without namespace-awareness (Average_1V as is), mysterious failure of a perfectly valid file...
    
    jfp$ java Average_1V Nom_note1.ds.xml Nom_note1-2.ns.xsd Invalid file : cvc-elt.1 : Déclaration de l'élément 'liste' introuvable.
  - with namespace-awareness (Average_1VA = Average_1V corrected along the lines of Documents created...), all right
    
    jfp$ java Average_1VA Nom_note1.ds.xml Nom_note1-2.ns.xsd Toto's mark is 12 Tata's mark is 13 Tutu's mark is 17 Tutu's mark is 11 Average is : 13.0
  - but if we try with a file which is supposed to be equivalent - and the same schema - we get a very different result :
    
    jfp$ java Average_1VA Nom_note1.ns.xml Nom_note1-2.ns.xsd Exception in thread "main" java.lang.Exception: Empty contents at Average_1VA.average_1(Average_1VA.java:45) at Average_1VA.main(Average_1VA.java:87)
    
    ????????????
3. A technical detail...
  That's because we forgot the warning issued in DOM manipulation of namespace aware documents, to use
  getElementsByTagNameNS(String namespaceURI, String localName)
  instead of getElementsByTagName(String name)
  
  Now, where do we find the namespaceURI?
  The JavaDoc for the Node interface has a bizarre comment about the getNamsepaceURI() method :
  
  This is not a computed value that is the result of a namespace lookup based on an examination of the namespace declarations in scope. It is merely the namespace URI given at creation time.
  For nodes of any type other than ELEMENT_NODE and ATTRIBUTE_NODE and nodes created with a DOM Level 1 method, such as Document.createElement(), this is always null.
  
  However, it works !
  We modify the first lines of the average_1 method as follows (Average_1VAN) :
  
  public static float average_1(Document doc) throws Exception { Element start = doc.getDocumentElement(); String ns = start.getNamespaceURI(); if( ns == null ) throw new Exception("no namespace"); int k = 0; //number of students/marks int s = 0; //total NodeList students = start.getElementsByTagNameNS(ns, "student"); ......... the rest of the code unchanged .........
  
  and finally
  
  jfp$ java Average_1VAN Nom_note1.ns.xml Nom_note1-2.ns.xsd Toto's mark is 12 Tata's mark is 13 Tutu's mark is 17 Tutu's mark is 11 Average is : 13.0 jfp$ java Average_1VAN Nom_note1.ds.xml Nom_note1-2.ns.xsd Toto's mark is 12 Tata's mark is 13 Tutu's mark is 17 Tutu's mark is 11 Average is : 13.0
  
  OK, this time our class works for both equivalent files (with prefix, with default), as was (of course) expected.
  That Average-1VA should work for one and not for the other is a clear indication that JAXP has some problems with namespaces.
  
  PHP is less erratic : the same Average_1V.php works also with one namespace (with prefix as well as with default).
  But of course, if several namespaces are involved, use of getElementsByTagNameNS and Co. is required !
4. Not so technical, after all...
  Indeed, with more than one namespace, this heavy machinery is clearly justified.
  Let's see what happens to the Average_R class (for Names & Marks in RDF form) if we want validation.
  What changes from Average_R is
  
  public static float average_rdf(Document doc, PrintWriter sortie, String rdfURI, String epitaURI) throws Exception { Element start = doc.getDocumentElement(); int k = 0; //number of students/marks int s = 0; //total NodeList eleves = start.getElementsByTagNameNS(rdfURI, "Description"); for( int i = 0; i < eleves.getLength(); i++ ){ Element eleve = (Element) eleves.item(i); // explicit cast ! String name = eleve.getAttributeNS(rdfURI, "about").substring(1); // drop the '#' sign Element note = (Element) eleve.getElementsByTagNameNS(epitaURI, "mark").item(0); .......... unchanged ............. }// average_rdf
  
  the rest is taken from Average_1VAN.
  Whole code Average_RV.
  The same in DOM-PHP : Average_RV.php
  
  Beware of namespaces !