XML technologies
A Minimal Introduction to XPath
- About XPath
- XPath is a language
for specifying sets of paths in an XML tree.
- A first approximation
of XPath is given by the language of Unix
filenames
- Use of predicates
- A whole library of functions
- Java
- Package javax.xml.xpath
- A simple example
We restrict ourselves here to
- XPath version 1.0
- the simplified syntax
For a more complete treatment see Wikipedia,
or one of the many tutorials available online.
There used to be an excellent introduction to XPath at O'Reilly, which
disppeared this year (2015).
The 10-Minute XPath Tutorial is no replacement
for it, but provides a good starting point.
-
A well-formed word in XPath is an expression
(in the sense of arithmetic or boolean expressions in programming
languages,
i.e. a construct which is evaluable, not executable, as opposed to a
statement)
- to be evaluated in a given context (a node
n
in an XML document D
),
- the value of which may be either
- a set of paths of
D
originating at n
,
or (equivalently) the set of nodes of D
that are
reached from node n
by the said paths.
- a string
- an integer
- a boolean
Note that the XPath facility in
Eclipse will work with
node sets only !
To get a full functionality, do install the "Eclipse XPath evaluation" plugin...
Examples with the Garage.xml
document, evaluation with xmllint --xpath
at the root of
the document.
-
- Tree structure of directories where tags = directories
XPath expression "/Garage/Car/Engine/Ignition"
evaluates to the set of the 3 <
Ignition
>
nodes of
the 3 cars in the garage.
jfp$ xmllint Garage.xml --xpath
"
/Garage/Car/Engine/Ignition
"
<Ignition>Working
well</Ignition><Ignition>defective</Ignition><Ignition>All
right</Ignition>
jfp$
An often used shorthand is "//tag-name
" indicating
all nodes with the said tag, wherever they are.
jfp$ xmllint Garage.xml --xpath "//Transmission"
<Transmission type="manual" gear_nb="4">
<GearBox/>
<FrontAxle/>
<RearAxle/>
</Transmission><Transmission type="automatic"
gear_nb="5">
<GearBox/>
<FrontAxle/>
<RearAxle/>
</Transmission><Transmission type="manual"
gear_nb="5">
<GearBox/>
<FrontAxle/>
<RearAxle/>
</Transmission>
jfp$
- Extended with a notation for XML attributes
XPath expression "/Garage/Car/@make"
evaluates to the set of the 3 attribute nodes "make
" of
the 3 cars in the garage.
jfp$ xmllint Garage.xml --xpath
"/Garage/Car/@make"
make="Citroën" make="Renault" make="Toyota"
jfp$
- Augmented with a useful function
count
operating on the set
of nodes that is the value of its argument
XPath expression "count(/Garage/Car/
Engine)
"
evaluates to 3 (as an integer)
jfp$ xmllint Garage.xml --xpath
"count(/Garage/Car/Engine)"
3
jfp$
as well as "count(/Garage/Car/Body/@color
)
"
-
Predicates are tests (boolean expressions) written between square
brackets, which result in selecting those nodes for which the test
suceeds.
Combined with XPath functions they are quite powerful.
Examples with the Garage.xml
document (continued)
- XPath expression
"/Garage/Car[@make =
'Renault']/Engine/Ignition"
evaluates to
the set of only one of the 3 <Ignition>
nodes of
the 3 cars in the garage.
Note that the equality comparator is "=
", not "==
".
jfp$ xmllint Garage.xml --xpath
"/Garage/Car[@make = 'Renault']/Engine/Ignition"
<Ignition>defective</Ignition>
jfp$
- XPath expression
"//Body[@color =
'red']/../@make"
evaluates to
the set of only one the 3 make
attributes of
the 3 cars in the garage.
Recall that the double slash "//
" means "anywhere from the
root", and that "..
" means "the parent node"
jfp$ xmllint Garage.xml --xpath
"//Body[@color = 'red']/../@make"
make="Citroën"
jfp$
What is the value of "//Transmission[@gear_nb =
5]/../Body/@color"
?
- Predicates may be combined with the usual logical connectives
or
, and
and not
:
e.g. "/Garage/Car[Body/@color = 'red' and
Transmission/@type = 'manual']"
jfp$ xmllint Garage.xml --xpath
"/Garage/Car[Body/@color = 'red' and Transmission/@type =
'manual']/Engine/Ignition"
<Ignition>Working well</Ignition>
Other examples, with Names & marks documents :
- format #1 (with attributes) :
xmllint NN1
--xpath "count(/list/student[@mark > 12])"
--> 9
- format #2 (with child nodes) :
xmllint NN2
--xpath "count(//student[mark > 12])"
--> 9
(meaning the number of student
nodes having child
nodes named mark
the content of which, interpreted as an
integer, is larger than 12)
-
- We already saw
count
(integer value, applied to a node set)
- A typical example of arithmetic functions & operators : average
xmllint NN2 --xpath "(sum(//mark) div
count(//student))"
--> 12.4286
position()
yields the rank of the current node amond its siblings. Often used in predicates :
/list/student[position()='1']
will yield the first student in the list, etc. - commonly abbreviated as /list/student[1]
- function
last()
gives the index of the last sibling
/list/student[position()=last()]
will yield the last student in the list - no known abbrev. !
- a number of string-valued functions, e.g.
concat (s1, s2,...)
and boolean-valued functions with string arguments, e.g. contains (str, substr)
see Wikipedia for a complete list.
-
http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/xpath/package-summary.html
Conflict between two different logics :
- XPath is about NodeSets, Nodes, Strings,
Booleans and Numbers,
- Java-DOM knows NodeLists, also Nodes, Strings,
Booleans and Numbers,
but with subtly different meanings.
The XPath API provides a translation between the two... beware of Lost
in Translation !
XPath objects are created as XPath xp =
XPathFactory.newInstance().newXPath();
They are endowed with an evaluate
method
- with arguments
- a
String
that represents an XPath
expression;
- an
InputSource
(org.xml.sax.InputSource
),
or a (DOM-) Document
, or a (DOM-) Node
, or
a (DOM-) NodeList
;
- a constant which determines the type of the result
(a static attribute of class javax.xml.xpath.XPathConstants
)
- returning an object that must be cast, either to
NodeList
,
Node
, Boolean
, String
or Double
,
according to the type prescribed by the 3rd argument :
XPathConstants.NODESET
--> org.w3c.dom.NodeList
XPathConstants.NODE
--> org.w3c.dom.Node
XPathConstants.STRING
--> java.lang.String
XPathConstants.BOOLEAN
--> java.lang.Boolean
(class, not the base type boolean
)
XPathConstants.NUMBER
--> java.lang.Double
(class, not the base type double
)
-
XPath xp = XPathFactory.newInstance().newXPath();
- InputSource from filename :
InputSource ip = new
InputSource("NN1.xml");
(names & marks, format #1)
- A suitable XPath expression :
String exp =
"/list/student[@mark=20]"
- Choosing
XPathConstants.NODESET
means we want
the NodeList
of the students having the top mark.
NodeList the_best = (NodeList) xp.evaluate(expr, ip,
XPathConstants.NODESET);
for( int i = 0; i < the_best.getLength(); i++ ){
Sytem.out.println(the_best.item(i).getAttribute("name"));
}
See here for a variation
on this theme.