XML technologies

A Minimal Introduction to XPath

Jean-François Perrot

  1. About XPath
    1. XPath is a language for specifying sets of paths in an XML tree.
    2. A first approximation of XPath is given by the language of Unix filenames
    3. Use of predicates
    4. A whole library of functions

  2. Java
    1. Package javax.xml.xpath
    2. A simple example

We restrict ourselves here to
For a more complete treatment see Wikipedia, or one of the many tutorials available online.
There used to be an excellent introduction to XPath at O'Reilly, which disppeared this year (2015).
The 10-Minute XPath Tutorial is no replacement for it, but provides a good starting point.

About XPath

  1. XPath is a language for specifying sets of paths in an XML tree.

    A well-formed word in XPath is an expression
    (in the sense of arithmetic or boolean expressions in programming languages,
    i.e. a construct which is evaluable, not executable, as opposed to a statement)
    1. to be evaluated in a given context (a node n in an XML document D),
    2. the value of which may be either
      • a set of paths of D originating at n,
        or (equivalently) the set of nodes of D that are reached from node n by the said paths.
      • a string
      • an integer
      • a boolean

      Note that the XPath facility in Eclipse will work with node sets only !
      To get a full functionality, do install the "Eclipse XPath evaluation" plugin...

    Examples with the Garage.xml document, evaluation with xmllint --xpath at the root of the document.
  2. A first approximation of XPath is given by the language of Unix filenames

    1. Tree structure of directories where tags = directories

      XPath expression "/Garage/Car/Engine/Ignition" evaluates to the set of the 3 <Ignition> nodes of the 3 cars in the garage.

      jfp$ xmllint Garage.xml --xpath "/Garage/Car/Engine/Ignition"
      <Ignition>Working well</Ignition><Ignition>defective</Ignition><Ignition>All right</Ignition>
      jfp$



      An often used shorthand is "//tag-name" indicating all nodes with the said tag, wherever they are.

      jfp$ xmllint Garage.xml --xpath "//Transmission"
      <Transmission type="manual" gear_nb="4">
          <GearBox/>
          <FrontAxle/>
          <RearAxle/>
        </Transmission><Transmission type="automatic" gear_nb="5">
          <GearBox/>
          <FrontAxle/>
          <RearAxle/>
        </Transmission><Transmission type="manual" gear_nb="5">
          <GearBox/>
          <FrontAxle/>
          <RearAxle/>
        </Transmission>
      jfp$


    2. Extended with a notation for XML attributes

      XPath expression "/Garage/Car/@make" evaluates to the set of the 3 attribute nodes "make" of the 3 cars in the garage.

      jfp$ xmllint Garage.xml --xpath "/Garage/Car/@make"
      make="Citroën" make="Renault" make="Toyota"
      jfp$




    3. Augmented with a useful function count operating on the set of nodes that is the value of its argument

      XPath expression "count(/Garage/Car/Engine)" evaluates to 3 (as an integer)

      jfp$ xmllint Garage.xml --xpath "count(/Garage/Car/Engine)"
      3
      jfp$



      as well as "count(/Garage/Car/Body/@color)"

  3. Use of predicates

    Predicates are tests (boolean expressions) written between square brackets, which result in selecting those nodes for which the test suceeds.
    Combined with XPath functions they are quite powerful.

    Examples with the Garage.xml document (continued)

    Other examples, with Names & marks documents :

  4. A whole library of functions

    see Wikipedia for a complete list.

Java

  1. Package javax.xml.xpath :

    http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/xpath/package-summary.html

    Conflict between two different logics :

    The XPath API provides a translation between the two... beware of Lost in Translation !


    XPath objects are created as XPath xp = XPathFactory.newInstance().newXPath();
    They are endowed with an evaluate method

  2. A simple example

    1. XPath xp = XPathFactory.newInstance().newXPath();

    2. InputSource from filename : InputSource ip = new InputSource("NN1.xml"); (names & marks, format #1)

    3. A suitable XPath expression : String exp = "/list/student[@mark=20]"

    4. Choosing XPathConstants.NODESET means we want the NodeList of the students having the top mark.

      NodeList the_best = (NodeList) xp.evaluate(expr, ip, XPathConstants.NODESET);
      for( int i = 0; i < the_best.getLength(); i++ ){
         Sytem.out.println(the_best.item(i).getAttribute("name"));
      }


    See here for a variation on this theme.