XML technologies

DOM-Java by example

Jean-François Perrot

  1. About the Document Object Model (DOM)

  2. Elementary approach
    1. Using an XML document (from an XML file)
    2. Building an XML document from a text file
    3. Transforming an XML document into another one

  3. Advanced features
    1. Character encoding of text files
    2. Linking to a DTD
    3. Examples

  4. Homework

About the Document Object Model (DOM)

The DOM aims at setting up an abstract framework for representing XML trees which remains essentially the same across programming languages.
As an illustration, I provide here both Java and PHP implementations of the same programs.
Remember that the only concepts of Object-Oriented Programming that are valid in all languages are those of Class, Instance and (Class) Inheritance.
Since classes carry an idea of executable code that cannot be formulated without reference to a given language,
the common abstract specification deals only with Interfaces (in the Java sense).

Hierarchy

Elementary approach

  1. Using an XML document (from an XML file)

    1. Reading names & marks, computing the average of the class (model #1 - with attributes)

      With this data file, we wish to obtain the following execution :

      jfp$ java Average_1 NameMark1.xml
      Toto's mark is 12
      Tata's mark is 13
      Tutu's mark is 17
      Titi's mark is 07

      Average is : 12.25
      jfp$



      Java class : Average_1.
      PHP code

    2. Reading names & marks, computing the average of the class (model #2 - with child nodes)
      We now want exactly the same execution record, but with data in the other XML format: data file.

      Java class : Average_2.
      PHP code
  2. Building an XML document from a text file

    Data file

    1. Building for model #1 - with attributes: Java class Build1fromText
      (usage: java Build1fromText NamesMarks.txt NN1.xml)
      PHP code

    2. Building for model #2 - with child nodes: Java class Build2fromText
      PHP code
  3. Transforming an XML document into another one

    1. Names & Marks from model #1 to model #2
      Data file.
      Java class : One2Two.
      (usage: java One2Two NameMark1.xml NN2.xml)
      PHP code

Advanced features

  1. Character encoding of text files

    The point is to ensure that your text data are read with the proper character encoding.
    Note that this applies to raw text only, not to XML files.
    Indeed, the character encoding of an XML file is always specified
    Observe that the parse method of the DocumentBuilder interface reads an InputStream, i.e. bytes - not characters -,
    and that it uses the xml header to decide how to decode the bytes.

    This is not the case for raw text, so that it is your responsibility to specify how you intend to read it.
    By default, Java will use the "platform default encoding" which depends on your installation and may not be the one you intended...

    Things are slightly different for file output :

    1. If you do not say anything,
      • your output file will be coded as UTF-8
      • and the corresponding info will be written in the xml header.

    2. If you wish to use another encoding, you have to inform the Transformer, e.g.
      trans.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

      • your output file will be coded as you chose (in our example, Latin-1)
      • and the corresponding info will be written in the xml header.

    You may wish to review the java.io package.

    Building for names & marks model #1 : Java class Build1CfromText
    Data file

    This does not apply to PHP (wait for PHP-6 !).
  2. Linking to a DTD

    Requires using a DOMImplementation object, in PHP as well as in Java.

    Building for model #1 : Java class Build1CDfromText
    PHP code (Build1DFromText.php).
  3. Examples

Homework

  1. Write a class Two2OneF homologue of One2TwoF.

  2. Build XML Names&Marks from an Excel file (csv) instead of a plain text file.

  3. A more substantial exercise to try your hand at DOM-Java.
    Let me know if you encounter any problem.