XML technologies
DOM-Java by example
- About the Document Object Model (DOM)
- Elementary approach
- Using an XML document (from an XML
file)
- Building an XML document from a
text file
- Transforming an XML document into
another one
- Advanced features
- Character encoding of text files
- Linking to a DTD
- Examples
- Homework
The DOM aims at setting up an abstract framework for representing XML
trees which remains essentially the same across programming
languages.
As an illustration, I provide here both Java and PHP implementations of
the same programs.
Remember that the only concepts of Object-Oriented Programming that are
valid in all languages are those of Class, Instance and
(Class) Inheritance.
Since classes carry an idea of executable code that cannot be
formulated without reference to a given language,
the common abstract specification deals only with Interfaces
(in the Java sense).

-
- Reading names
&
marks, computing the average of the class
(model #1 - with attributes)
With this data
file, we wish to obtain the following execution :
jfp$ java Average_1 NameMark1.xml
Toto's mark is 12
Tata's mark is 13
Tutu's mark is 17
Titi's mark is 07
Average is : 12.25
jfp$
Java class : Average_1
.
PHP code
- Reading names & marks, computing the average of the class
(model #2 - with child nodes)
We now want exactly the same execution record, but with data in the
other XML format: data
file.
Java class : Average_2
.
PHP code
-
Data file
- Building for model #1 - with attributes: Java class
Build1fromText
(usage: java Build1fromText NamesMarks.txt NN1.xml
)
PHP
code
- Building for model #2 - with child nodes: Java class
Build2fromText
PHP code
-
- Names & Marks
from model #1 to model #2
Data
file.
Java class : One2Two
.
(usage: java One2Two NameMark1.xml NN2.xml
)
PHP code
-
The point is to ensure that your text data are read with the proper
character encoding.
Note that this applies to raw text only, not to XML files.
Indeed, the character encoding of an XML file is always specified
- either as an explicit indication in the xml header : e.g.
<?xml
version="1.0" encoding='ISO-8859-1'?>
- or defaults to utf-8
<?xml version="1.0"?>
Observe that the parse
method of the DocumentBuilder
interface reads an InputStream
, i.e. bytes - not
characters -,
and that it uses the xml header to decide how to decode the bytes.
This is not the case for raw text, so that it is your responsibility to
specify how you intend to read it.
By default, Java will use the "platform default encoding" which depends
on your installation and may not be the one you intended...
Things are slightly different for file output :
- it is essential that the coding information in the xml header
of an XML file should match the actual coding of the file !
- therefore, this info must be given only once - but, to whom,
and how ?
- If you do not say anything,
- your output file will be coded as UTF-8
- and the corresponding info will be written in the
xml header.
- If you wish to use another encoding, you have to inform
the
Transformer
, e.g.
trans.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
- your output file will be coded as you chose (in our
example, Latin-1)
- and the corresponding info will be written in the
xml header.
You may wish to review the java.io
package.
Building for names
& marks model #1 : Java class Build1CfromText
Data file
This does not apply to PHP (wait for PHP-6 !).
-
Requires using a
DOMImplementation
object, in PHP as well
as in Java.
Building for model #1 :
Java class Build1CDfromText
PHP code (Build1DFromText.php
).
-
- from model #1 to model
#2 revisited : Java class
One2TwoF
.
Final version : char. encoding & DTD & modular design.
- from model #1 to XHTML : Java class
One2HtmlT
.
The generated file is complete with namespace & DTD with PUBLIC
and SYSTEM
references,
ready for checking by the W3C verifier.
- Write a class
Two2OneF
homologue of One2TwoF
.
- Build XML Names&Marks from an Excel file (csv) instead of a
plain text file.
- A more substantial
exercise to
try your hand at DOM-Java.
Let me know if you encounter any problem.