Representing trees as text files
Jean-François Perrot
- Character Encoding
- Tree encoding
- Eclipse tools
- Visualizing the
tree
structure
- A word about DTDs (Document Type
Definitions)
-
Texts are made up of characters, not bytes !
Usually character encoding is not indicated in text files - but in XML encoding ie always given.
- Explicit setting in the file header
- e.g.
<?xml version="1.0" encoding="iso-8850-1" ?>
- with default = UTF-8 (8-bit encoding for Unicode).
<?xml version="1.0"?>
= <?xml version="1.0" encoding="utf-8" ?>
- In this course, UTF-8 encoding will be systematically used.
- Example : Capital
cities 1 - Capital
cities 2
-
XML followthe the
tradition of markup languages [Wikipedia]
- as opposed to JSON & YAML.
i.e. it uses a parenthesis
systems
(tags) extended with
- attributes (in the opening tag)
- text content
- that's all !
See the example of the three levels for representing cars
- Tags
only
- Tags
with attributes
- Tags
with attributes and textual content
Other example : different ways to represent a system of names & marks.
-
demo
-
Nodes may be collapsed or extended.
- Some text editors (e.g. TextWrangler) do. Others (e.g. TextEdit) do not.
- Modern browsers do
(but note that the view offered by a browser does not exactly reproduce the source code
!)
- Eclipse
-
- A way of specifying tree structure
- inherited from SGML the common ancestor of all markup
languages [Wikipedia].
- superseded (see session #3)
- but still widely used
See [Wikipedia] for details.
Note that a DTDs is an integral part of the structure of the XML
document,
whereas XML schemas or Relax NG grammars are linked to the file by
means of an ordinary attribute.
This is due to inheritance from SGML, and adds quite substantially to
the complexity of XML programming,
as we shall see later (DOM).
- Example : XML
file DTD
file
<?xml version="1.0" encoding="UTF-8"?>
<!-- DTD for Cars -->
<!ELEMENT Car (Body, Engine, Transmission)>
<!ATTLIST Car make CDATA #REQUIRED>
<!ATTLIST Car model CDATA #REQUIRED>
<!ELEMENT Body (Hood)>
<!ATTLIST Body color CDATA #REQUIRED>
<!ELEMENT Hood (#PCDATA)>
<!ELEMENT Engine (Cylinders, Ignition)>
<!ELEMENT Cylinders EMPTY>
<!ELEMENT Ignition (#PCDATA)>
<!ELEMENT Transmission (GearBox, FrontAxle, RearAxle)>
<!ATTLIST Transmission type (automatic | manual) #REQUIRED>
<!ATTLIST Transmission gear_nb (3 | 4 | 5) #REQUIRED>
<!ELEMENT
GearBox
EMPTY>
<!ELEMENT
FrontAxle
EMPTY>
<!ELEMENT
RearAxle
EMPTY>
- Validation
- by the W3C Validator
http://validator.w3.org/
(for a publicly accessible DTD file that is referred from
the XML)
- with Eclipse