Introductory course at ESSLII 2002

Annotation of Language Resources

Lecture II.

XML-Related Recommendations

Tomaž Erjavec
Department of Intelligent Systems
Institute Jožef Stefan
Jamova 39, SI-1000 Ljubljana
Slovenia

Abstract

Abstract

This lecture discusses developments related to XML, in particular XML Schemas, XML Namespaces, XPath, and the XML transformation language, XSLT.


1. XML-Related Proposals

In this part we look at the following XML-related proposals:
  • XML Namespaces
  • XML Schemas
  • XPath
  • XSLT
  • XSL

1.1. XML Namespaces

1.1.1. XML Namespaces: motivation

A single XML document could usefully contain elements and attributes ("markup vocabulary") that are defined for and used by multiple software modules.
Such documents pose problems of recognition and collision. Software modules need to be able to recognise the tags and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element type or attribute name.
Therefore document constructs should have universal names, whose scope extends beyond their containing document; such universal names are defined by the XML Namespaces specification (January 1999).

1.1.2. XML Namespaces: an example

An example:

<?xml version="1.0" ?>
<html xmlns="http://www.w3.org/HTML/1998/html4"
      xmlns:nms="http://www.names.net/address">
 <head><title>Addresses</title></head>
 <body>
   <nms:addresses nms:version="1.0">
   <hr/>
   <nms:person>
     <nms:title>Mr.</nms:title>
     <nms:first>Simon</nms:first>
     <nms:last>Schuster</nms:last>
   </nms:person>
   <hr/>
<!-- ... -->
 </body>
</html>

  • XML Namespaces provide a two-part naming system for element types and attributes
  • The xmlns prefixed attributes give the URI and - except for the default namespace - the local prefix of the namespaces
  • The meaning of the prefix of qualified names is inherited - and possibly overridden - by child elements

1.1.3. XML Namespace myths

There is less to XML Namespaces than meets the eye!
  • The URI namespace reference does not need to contain the DTD or in fact need to be valid; it is never resolved:
    
    <?xml version="1.0" ?>
    <html xmlns="http://completely-silly-address/ha/ha"
          xmlns:nms="brrr://another-silly-address/snicker">
    ...
    
    
  • The XML Namespaces recommendation is compatible with XML 1.0, hence it does not provide a way to validate a document against two or more DTDs; in fact, it is almost impossible to validate a document using XML Namespaces against a DTD:
    
    <?xml version="1.0" ?>
    <html xmlns="http://www.w3.org/HTML/1998/html4"
          xmlns:nms="http://www.names.net/address">
     <heard><tutle>Addresses</tutle></heard>
     <bodi>
       <nms:addresses nms:version="1.0">
       <gr/>
       ....
    
    
  • An overview of common misconceptions about XML namespaces is given in Ronald Bourret: Namespace Myths Exploded (2000).

1.2. XML Schemas

1.2.1. XML Schemas: beyond DTDs

Document Type Definitions, DTDs are the traditional way in which to declare document types and to validate SGML/XML documents. However, they have two problems:
  • DTDs can impose only weak constraints on attribute and element content
  • DTDs themselves are not written in XML, so tools to process (edit, validate, present) XML do not work with them
Several proposals exist to address these shortcomings:
  • Academia Sinica's Schematron;
  • OASIS / ISO DIS RELAX NG: Regular Language Description for XML --- Next Generation: a unification of the James Clark's Tree Regular Expressions for XML (TREX) and (JIS/ISO) Regular Language Description for XML (RELAX) schema languages.
  • W3C XML Schema; May 2001
Software exist that implements all of the above proposals and also knows how to convert DTDs to schemas.

1.2.2. XML Schemas: an example XML document

Example XML document from the W3C Schema Primer:

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>
    <billTo country="US">
        <name>Robert Smith</name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>
    <comment>Hurry, my lawn is going wild!</comment>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity>1</quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
        </item>
    </items>
</purchaseOrder>

1.2.3. XML Schemas: an example schema

The example XML Schema from the W3C Schema Primer:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

 <xsd:annotation>
  <xsd:documentation xml:lang="en">
   Purchase order schema for Example.com.
   Copyright 2000 Example.com. All rights reserved.
  </xsd:documentation>
 </xsd:annotation>

 <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

 <xsd:element name="comment" type="xsd:string"/>

 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="shipTo" type="USAddress"/>
   <xsd:element name="billTo" type="USAddress"/>
   <xsd:element ref="comment" minOccurs="0"/>
   <xsd:element name="items"  type="Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>

 <xsd:complexType name="USAddress">
  <xsd:sequence>
   <xsd:element name="name"   type="xsd:string"/>
   <xsd:element name="street" type="xsd:string"/>
   <xsd:element name="city"   type="xsd:string"/>
   <xsd:element name="state"  type="xsd:string"/>
   <xsd:element name="zip"    type="xsd:decimal"/>
  </xsd:sequence>
  <xsd:attribute name="country" type="xsd:NMTOKEN"
     fixed="US"/>
 </xsd:complexType>

 <xsd:complexType name="Items">
  <xsd:sequence>
   <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
    <xsd:complexType>
     <xsd:sequence>
      <xsd:element name="productName" type="xsd:string"/>
      <xsd:element name="quantity">
       <xsd:simpleType>
        <xsd:restriction base="xsd:positiveInteger">
         <xsd:maxExclusive value="100"/>
        </xsd:restriction>
       </xsd:simpleType>
      </xsd:element>
      <xsd:element name="USPrice"  type="xsd:decimal"/>
      <xsd:element ref="comment"   minOccurs="0"/>
      <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
     </xsd:sequence>
     <xsd:attribute name="partNum" type="SKU" use="required"/>
    </xsd:complexType>
   </xsd:element>
  </xsd:sequence>
 </xsd:complexType>

 <!-- Stock Keeping Unit, a code for identifying products -->
 <xsd:simpleType name="SKU">
  <xsd:restriction base="xsd:string">
   <xsd:pattern value="\d{3}-[A-Z]{2}"/>
  </xsd:restriction>
 </xsd:simpleType>

</xsd:schema>

1.3. Formatting and Transforming XML

This part of the course deals with the XSL family of W3C recommendations, in particular XPath, XSLT, and XSL. The structure and examples closely follow the excellent book:
<bibl> Neil Bradley:
The XSL Companion. Addison-Wesley, 2000 </bibl>

1.3.1. Formatting and Transforming XML: Introduction

XML markup is supposed to be descriptive (e.g. <title>) rather than presentational (e.g. <bold>). But, sooner or later, we do want to render the documents! How do we do this?
  • rendering built directly into software (e.g. HTML browsers)
  • direct conversion to output format with XML aware transformation software (e.g. with XSLT to HTML)
  • conversion to intermediary, abstract presentation oriented format, and from there to final output format (e.g. with XSLT to XSL to PDF)
Styling languages:
  • HTML: CCS (Cascading Style Sheets)
  • SGML: DSSSL (Document Style Semantics and Specification Language)
  • XML: XSL (eXtensible Stylesheet Language)

1.3.2. XSL history

The proposal for a stylesheet language originally named XSL was proposed to the W3C in 1997. But during its gestation, the proposal was pulled apart into three separate standards:
XPath (V1.0, November 1999)
defines a mechanism for locating information in XML documents, and has many other uses besides that in formatting documents
XSLT (V1.0, November 1999)
defines a means of transforming XML documents into other data formats (XML or otherwise), including (but not limited to) formatting languages
XSL (V1.0, October 2001)
is now properly used only to name a proposed standard for embedding formatting information in documents using XML elements.

1.4. XPath

1.4.1. XPath: introduction

XML Path Language (XPath) Version 1.0, W3C Recommendation November 1999 (Version 2.0, W3C Working Draft 30 April 2002).
  • The primary purpose of XPath is to address parts of an XML document; however, it has a natural subset that can be used for testing whether or not a node matches a pattern.
  • XPath uses a compact, non-XML syntax; it gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.

1.4.2. XPath: introduction 2

  • XPath operates on the abstract, logical structure of an XML document (its InfoSet), rather than its surface syntax; it models an XML document as a tree of nodes. There are different types of nodes, including element nodes, attribute nodes and text nodes.
  • The primary syntactic construct in XPath is the expression, which is evaluated to yield an object of type: node-set (an unordered collection of nodes without duplicates), boolean, number, or string.
  • The full syntax of XPath expressions is cumbersome, so various abbreviations are allowed.
  • Expression evaluation occurs with respect to its context node.

1.4.3. Examples I

*
selects all element children of the context node
@name
selects the attribute name of the context node
@*
selects all the attributes of the context node
para[1]
selects the first para child of the context node
*/para
selects all para grandchildren of the context node
/doc/chapter[5]/section[2]
selects the second section of the fifth chapter of the doc child of the root node
//para
selects all the para descendants of the document root and thus selects all para elements in the same document as the context node
//olist/item
selects all the item elements in the same document as the context node that have an olist parent
.
selects the context node
.//para
selects the para element descendants of the context node

1.4.4. Examples II

..
selects the parent of the context node
../@lang
selects the lang attribute of the parent of the context node
para[@type="warning"][5]
selects the fifth para child of the context node that has a type attribute with value warning
para[5][@type="warning"]
selects the fifth para child of the context node if that child has a type attribute with value warning
chapter[title="Introduction"]
selects the chapter children of the context node that have one or more title children with string-value equal to Introduction
chapter[title]
selects the chapter children of the context node that have one or more title children
employee[@secretary and @assistant]
selects all the employee children of the context node that have both a secretary attribute and an assistant attribute

1.4.5. XPath expressions

An XPath expression contains one or more location steps, separated by slashes. Each location step has the following form:
axis-name :: node-test [predicate]*
For example:
child::para[attribute::type="warning"]
The XPath axis contains a part of the document, defined from the perspective of the context node. The node test makes a selection from the nodes on that axis. By adding predicates, it is possible to select a subset from these nodes. If the expression in the predicate returns true, the node remains in the selected set, otherwise it is removed.
  • Some axis names: child, parent, descendant, ancestor, self, ancestor-or-self, following-sibling, following, attribute, ...
  • The ancestor, descendant, following, preceding and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap and together they contain all the nodes in the document.
  • Some node tests: literal name, *, text(),
  • A predicate is an expression; it is composed of values, operators and other XPath expressions. XPath also defines a set of functions for use in predicates.

1.4.6. Location steps

Location steps are similar to file system addressing (child:: axes below omitted):
A
selects all elements A that are children of the context node
A/B
select B elements that are children of A
A//B
select B elements that are descendants of A
/A
select root element A
/A//B
select B elements that are descendants of root element A
//*
select all elements of the tree

1.4.7. XPath functions

XPath also defines a number of functions. Here are some examples of their use in expressions:
child::text()
selects all text node children of the context node
child::para[last()]
selects the last para child of the context node
child::para[position()=1]
selects the first para child of the context node

1.4.8. Abbreviated syntax expressions

Full syntax is cumbersome, so various abbreviations are allowed:
  • child:: can be omitted from a location step
  • attribute:: can be abbreviated to @
  • position()=n can be abbreviated to n
  • etc.
For example,
child::para[position()=1][attribute::type="warning"] can be abbreviated to
para[1][@type="warning"]

1.4.9. XPath string functions

A number of functions are also defined for strings; some examples are:
starts-with(string, string)
Returns true if the first argument string starts with the second argument string, and otherwise returns false.
contains(string, string)
returns true if the first argument string contains the second argument string, and otherwise returns false.
concat(string, string, string*)
Returns the concatenation of all the strings given as arguments.
substring-before(string, string)
returns the substring of the first argument string that precedes the first occurrence of the second argument string in the first argument string, or the empty string if the first argument string does not contain the second argument string. For example, substring-before("1999/04/01","/") returns 1999.
normalize-space(string)
returns the argument string with whitespace normalised by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space.
translate(string, string, string)
returns the first argument string with occurrences of characters in the second argument string replaced by the character at the corresponding position in the third argument string. For example, translate("bar","abc","ABC") returns the string BAr.
etc.

1.4.10. XPath Version 2.0

XML Path Language (XPath) 2.0 W3C Working Draft 30 April 2002
  • XPath 2.0 is the result of joint work by the XSL and XML Query Working Groups, which are jointly responsible for XPath 2.0, a language derived from both XPath 1.0 and XQuery. These languages are closely related, sharing much of the same expression syntax and semantics, and much of the text found in the two Working Drafts is identical.
  • XPath 2.0 extends XPath 1.0 expression types (node-set, boolean , number, string) to all (19) XML Schema primitive types
  • XPath 2.0 introduces new operations and constructs: for, if then else, every, intersect, etc.

1.5. XSLT

1.5.1. XSLT: introduction

XSL Transformations (XSLT) Version 1.0 W3C Recommendation; 16 November 1999 (Version 2.0, W3C Working Draft; April 2002)
XSLT defines a means of transforming XML documents into other data formats. It can be used for:
  • transforming XML documents into documents using the XSL stylesheet language, i.e. for formatting XML documents
  • as a general XML transformation language, used to transmit data between applications
  • probably the most popular use is to convert XML into HTML.
Processors:
  • XSLT processor converts XML input into XML/XSL output, when supplied with a XSLT stylesheet
  • XSL processor convert XSL documents into device-dependent output formats.

1.5.2. Templates


<template match="para">  
  <apply-templates/>
</template>

  • A template specifies the transformation that is to be applied to a specific part of the source document
  • The match attribute, whose value is a XLST pattern, (a subset of XPath expressions), specifies which node the template applies to; matching is done with reference to the context node
  • <apply-templates/> triggers processing of the children nodes of the context node

1.5.3. Implied templates

  • Text nodes are output by default, just as if the following template were included:
    
    <template match="text()">  
      <value-of select="."/>
    </template>
    
    
  • Unnamed elements are processed by default, just as if the following template were included:
    
    <template match="*">  
      <apply-templates/>
    </template>
    
    
  • To ignore an element, e.g. <hide>, the following is therefore needed:
    
    <template match="hide"/>
    
    

1.5.4. Selective and repeated processing

  • It is possible to process only selected elements:
    
    <template match="chapter">  
      <apply-templates select="title"/>
      <apply-templates select="para"/>
    </template>
    
    
  • This processes the <title> twice:
    
    <template match="chapter">  
      <apply-templates select="title"/>
      <apply-templates/>
    </template>
    
    

1.5.5. Prefixes and suffixes

  • Text can be inserted before and after the node:
    
    <template match="chapter">  
      This text will appear before the content of chapter
      <apply-templates/>
      This text will appear after the content of chapter
    </template>
    
    
  • Using the <text> element, which allows for better control of whitespace:
    
    <template match="chapter">  
      <text> "</text><apply-templates/><text>" </text>
    </template>
    
    

1.5.6. Tag replacement and namespaces

Because the main purpose of XSLT is to convert to XML, replacing or inserting tags is very common. This can be done in two different ways:
  • Using the XSLT <element>:
    
    <template match="book">  
      <element name="HTML">
        <element name="HEAD">
          <element name="TITLE">The Title</element>
        </element>
        <element name="BODY">
          <apply-templates/>
        </element>
      </element>
    </template>
    
    
  • Using XML Namespaces:
    
    <xsl:template match="book">  
      <HTML>
        <HEAD><TITLE>The Title</TITLE></HEAD>
        <BODY>
          <xsl:apply-templates/>
        </BODY>
      <HTML>
    </xsl:template>
    
    

1.5.7. Element values

Input:
<para>Hello <hi>world</hi>!</para>
  • Copying document fragments:
    
    <xsl:template match="para">  
      <xsl:copy-of select=".">  
    </xsl:template>
    
    
    Output: Hello <hi>world</hi>!
  • Accessing content element as a string:
    
    <xsl:template match="para">  
      <xsl:value-of select=".">  
    </xsl:template>
    
    
    Output: Hello world!
  • Accessing specific elements:
    
    <xsl:template match="para">  
      <xsl:value-of select="hi"/>
    </xsl:template>
    
    
    Output: world

1.5.8. Attribute values

  • Accessing attribute content:
    Input:    
    <para type="important">Hello world!</para>
    
    Template: 
    
      <xsl:template match="para">  
        <P>
        [TYPE: <xsl:value-of select="@type"/>]
        <xsl:apply-templates/>
        </P>
      </xsl:template>
    
    
    Output:    
    
      <p>[TYPE: important] Hello world!</p>
    

1.5.9. Breaking well-formedness

XSLT stylesheet must be a well-formed XML document, and it outputs also only well-formed XML documents. This can sometimes be problematic:

Input:    
  <first>John</first>  <last>Smith</last>
  <first>Frank</first> <last>Furter</last>

Intended output:    
  <p>John Smith</p>
  <p>Frank Furter</p>

First try:

  <xsl:template match="first">  
    <P> <xsl:value-of select="."/>   <!-- WRONG! -->
  </xsl:template>

  <xsl:template match="last">  
    <xsl:value-of select="."/> </P>  <!-- WRONG! -->
  </xsl:template>

1.5.10. Second try


<xsl:template match="first">  
  &lt;p&gt; <xsl:value-of select="."/>  <!-- Escape <p> -->
</xsl:template>
<xsl:template match="last">  
  <xsl:value-of select="."/> &lt;/p&gt; <!-- Escape </p> -->
</xsl:template>

However, this doesn't work:

&lt;p&gt; John Smith &lt;/p&gt; 
&lt;p&gt; Frank Furter &lt;/p&gt; 

1.5.11. Disable output escaping

Output escaping can be disabled:

<xsl:template match="first">  
  <xsl:text disable-output-escaping="yes">&lt;p&gt;</xsl:text> 
  <xsl:value-of select="."/>
</xsl:template>
<xsl:template match="last">  
  <xsl:value-of select="."/> 
  <xsl:text disable-output-escaping="yes">&lt;/p&gt;</xsl:text> 
</xsl:template>

Note: the perceived need to D-O-E usually comes from thinking about the transformation on a wrong way; XSLT is not about writing out start and end tags in a linear stream, but about transforming one tree structure into another.
So, a better way of implementing the transformation would be:


  ...
  <xsl:apply-templates select="first"/>
  ...

<xsl:template match="first">
  <P>
    <xsl:apply-templates/>
    <xsl:apply-templates select="following-sibling::last[1]"/>
  </P>
</xsl:template>

1.5.12. Stylesheets

How do we specify a complete stylesheet?
  • Root XSLT stylesheet element is
    <xsl:stylesheet> ... </xsl:stylesheet>
  • To indicate that XSLT can be used for more than just styling, an equivalent element is:
    <xsl:transform> ... </xsl:transform>
    
  • Typical form:
    
    <xsl:stylesheet 
         version="1.0"
         xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    .
    .
    .
    </xsl:stylesheet>
    
    

1.5.13. Invoking stylesheets

There are three ways of invoking stylesheets:
  • Unlinked stylesheet:
    
    saxon esslli02.xml tlslides.xsl > esslli02.html
    
    
  • Referenced stylesheet:
    
    <!DOCTYPE TEI.2 SYSTEM 'teixlite.dtd' [                               
      <?xml-stylesheet type="text/xsl" href="tlslides.xsl"?>
    ]>
    
    
  • Embedded stylesheet:
    
    <?xml-stylesheet type="text/xsl" href="#localStyle"?>
    <!DOCTYPE TEI.2 SYSTEM 'teixlite.dtd'>
    <TEI.2>
      <xsl:stylesheet version="1.0"
           xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
           id="localStyle">
      ...
      </xsl:stylesheet>
    </TEI.2>
    
    

1.5.14. Top-level elements

These elements are children of <stylesheet>:
  • External definitions:
    
    <xsl:import href="tbl1.xsl"/>  <!--first element, included at end-->
    <xsl:include href="tbl2.xsl"/> <!--included here-->
    
  • Output specification:
    
    <xsl:output 
       method="xml"
       version="1.0"
       encoding="ISO-8859-1"
       standalone="no"
       doctype-system="tei2.dtd"
       doctype-public="-//TEI P3//DTD Main Document Type//EN"
       indent="yes"
       cdata-section-elements="code eg"
       media-type="text/xml"/>
    
    
  • Treatment of element whitespace:
    
    <xsl:preserve-space elements="head p"/>
    <xsl:strip-space elements="div"/>
    
    
  • Also some others, in particular, <template>

1.5.15. Contextual formatting

As the values of the match attribute are patterns, templates can be sensitive to the context of an element:
  • Trivial case:
    
    <xsl:template match="div">  ... </xsl:template>
    <xsl:template match="head"> ... </xsl:template>
    <xsl:template match="p">    ... </xsl:template>
    
    
  • Specific ancestor, child and sibling:
    
    <xsl:template match="A//X">   ... </xsl:template>
    <xsl:template match="X[C]">   ... </xsl:template>
    <xsl:template match="P[S]/X"> ... </xsl:template>
    
    
  • Specific attribute and attribute value:
    
    <xsl:template match="X[@a]">   ... </xsl:template>
    <xsl:template match="X[@a='v']"> ... </xsl:template>
    
    

1.5.16. Template priorities

Only one XSLT template can apply to a specific instance of an element. If two or more templates match an element, the conflict is resolved using template priority:
  • With an explicit attribute:
    
    <xsl:template match="para//emph"  priority="1"> ... </xsl:template>
    <xsl:template match="quote//emph" priority="2"> ... </xsl:template>
    
    
  • With default values:
    xsl:priority="-0.5"
    Simple node test (xsl:match="*")
    xsl:priority="-0.25"
    Unqualified element name (xsl:match="p")
    xsl:priority="0"
    Element name qualified with namespace (xsl:match="tei:p")
    xsl:priority="0.5"
    Other cases, i.e. contextual match (xsl:match="p/emph")
  • Conflict resolution: if two templates remain applicable, the processor usually issues a warning and selects the one closer to the end of the stylesheet.

1.5.17. Modes

Same content can be invoked in various parts of the stylesheet, but needs to be formatted differently. This context dependent behaviour can be achieved using modes:

  <xsl:template match="title"> 
    <!-- formatting for <title> in the body of document -->
  </xsl:template>

  <xsl:template match="title" mode="toc">
    <!-- formatting for <title> in the table of contents -->
  </xsl:template>

  ...

    <!-- Generating the table of contents: -->
    <xsl:apply-templates mode="toc" select="//title"/>

1.5.18. Attribute values

Input:

<image file="house.jpg" x="100" y="100">My house</image>

Output:

<IMG SRC="house.jpg" HEIGHT="100" WIDTH="100" ALT="My house"/>

XSLT element <attribute> can be used:

<xsl:template match="image">
  <xsl:element name="IMG">
    <xsl:attribute name="SRC"><value-of select="@name"/></xsl:attribute>
    <xsl:attribute name="HEIGHT"><value-of select="@x"/></xsl:attribute>
    <xsl:attribute name="WIDTH"><value-of select="@y"/></xsl:attribute>
    <xsl:attribute name="ALT"><value-of select="."/></xsl:attribute>
  </xsl:element>
</xsl:template>

Shorthand notation: attribute expressions enclosed in curly brackets:

<xsl:template match="image">
  <IMG SRC="{@name}" HEIGHT="{@y}" WIDTH="{@x}" ALT="{text()}"/>
</xsl:template>

1.5.19. Conditional constructs

XSLT provides two elements to select optional pieces of a template:
  • If statement:
    
    <xsl:if test="not(position() = last)">
      <xsl:text>, </xsl:text>
    </xsl:if>
    
    
  • Multiple choices:
    
    <xsl:choose>
      <xsl:when test="@type='error'">
         <FONT color="red"><xsl:apply-templates/></FONT>
      </xsl:when>
      <xsl:when test="@type='warning'">
         <FONT color="yellow"><xsl:apply-templates/></FONT>
      </xsl:when>
      <xsl:otherwise>
        <xsl:apply-templates/>
      </xsl:otherwise>
    </xsl:choose>
    
    

1.5.20. Sorting

Sorting specified in <apply-templates>:

Input:
  <people>
    <name>
      <first>John</first>
      <last>Smith</last> 
      <age>53</age>
    </name>
    <name>
      <first>Frank</first>
      <last>Furter</last>
      <age>35</age>
    </name>
  </people>

Template:
  <xsl:template match="people">
    <DIV>
      <xsl:apply-templates>
        <xsl:sort select="last" language="en" />
        <xsl:sort select="age" data-type="number" order="descending"/>
      </xsl:apply-templates>
    </DIV>
  </xsl:template>

1.5.21. Numbering

Numbering is specified in <template> with <number>:
  • The number is the position of the element within its list of sibling elements:
    
    Input:                            Output:                  
      <people>                            
        <name>John Smith</name>           1) John Smith  
        <name>Frank Furter</name>         2) Frank Furter
      </people>
    
      <xsl:template match="name">
        <xsl:number/> <xsl:text>) </xsl:text>
        <xsl:apply-templates/>
      </xsl:template>
    
    
  • Formatting:
    
        <xsl:number format="a"/>. <xsl:apply-templates/>
    Output:
        a. John Smith
        b. Frank Furter
    
        <xsl:number format="(i)"/> <xsl:apply-templates/>
    Output:
        (i) John Smith
        (ii) Frank Furter
    
    

1.5.22. Advanced numbering

  • Manipulating the counter:
    
        <xsl:number value="position()" format="1) "/>
    Output:
        1) John Smith
        2) Frank Furter
    
        <xsl:number value="last() + 1 - position()" format="1) "/>
    Output:
        2) John Smith
        1) Frank Furter
    
    
  • Element selection:
    • Count only those that have status different from ignore:
      <xsl:number count="item[not(status@='ignore'")]/>
    • Count both normal and special:
      <xsl:number count="normal | special"/>
    • Count ancestor element:
      
        <xsl:template match="title">
          <H1>
            <xsl:number count="div1"/>) <xsl:apply-templates/>
          </H1>
        </template>
      
    • Multipart numbering:
      
          <xsl:number count="div1"/>.
          <xsl:number count="div2"/>)
      
      equivalently:
      
          <xsl:number level="multiple" count="div1 | div2" format="1.1)"/>
      
    • Document-wide numbering:
      
        <xsl:template match="table/title">
          <TITLE>
            <xsl:number level="any" count="table"/>
            <xsl:apply-templates/>
          </TITLE>
        </xsl:template>
      

1.5.23. Linking with IDs

XML IDs can be used in XSLT:
  • The ID attribute must be declared in DTD:
    <!ATTLIST div name ID #REQUIRED>
  • IDs are accessed with the id() XPath function:
    <xsl:template match="id('chp-intro')"> ... </xsl:template>
  • The argument of id() is #IDREFS:
    <xsl:template match="id('chp-intro chp-conc')"> ... </xsl:template>

1.5.24. Linking with keys

A more flexible method is possible by using keys:
  • Keys do not have to be defined in a DTD, be stored in an attribute, be unique, be XML names or refer to only one element.
  • Keys are top-level elements and are defined by a name, elements that are referred to, and what part of them is considered the identifier value:
    <xsl:key name="Personnel" match="people/name" use="last"/>
  • Keys are accessed by using the key() XPath function:
    <xsl:template match="key('Personnel' 'Smith')"> ... </xsl:template>
  • Keys are just a shorthand for a predicate, but XSLT software might process keys more efficiently.

1.5.25. Variables

Variables are named objects that hold values.
  • Defining the value:
    <xsl:variable name="color">red</xsl:variable>
    <xsl:variable name="colour" select="@color"/>
    
    
  • XSLT is a declarative language, so a value of a variable cannot be changed:
      <xsl:variable name="n">1</xsl:variable>
      <xsl:variable name="n">2</xsl:variable> <!-- WRONG! -->
  • However, a variable definition in a template overrides one made globally:
    <xsl:stylesheet ...>
      <xsl:variable name="level">1</xsl:variable>
      <xsl:template match="*">
        <xsl:variable name="level">2</xsl:variable> <!-- OK -->
  • Variables are referenced by prefixing their name with $:
    The sky was
    <FONT color="{$color}"><xsl:value-of select="$color"></FONT>
  • Variables can also contain result-tree fragments:
    <xsl:variable name="warning"><hi>Warning!</hi></xsl:variable>
    ...
    <xsl:copy-of select="$warning">

1.5.26. Named templates

Repetitive output structures can be encapsulated in named templates.
  • A named template is defined in the usual way but it is given a name:
    <xsl:template name="line">
      <BR/><HR/><BR/>
    </xsl:template>
    
  • It is invoked by <call-template>:
      <xsl:template select="chapter">
        <xsl:call-template name="line">
        <H1>New chapter</H1> <xsl:apply-templates/>
      </xsl:template>
    
  • Templates can have parameters:
      <xsl:template name="colorize">
        <xsl:param name="color">white</xsl:param>
         <FONT color="{$color}"> <xsl:apply-templates/>
         </FONT>
      </xsl:template>
    
      <xsl:template select="error">
        <xsl:call-template name="colorize">
          <xsl:with-param name="color">red</xsl:with-param>
        </xsl:call-template>
      </xsl:template>
      <xsl:template select="warning">
        <xsl:call-template name="colorize">
          <xsl:with-param name="color">yellow</xsl:with-param>
        </xsl:call-template>
      </xsl:template>
    

1.5.27. Other XSLT features

  • When we want to loop through elements directly, we can use direct processing:
    <xsl:for-each select="row">
      ...
      <xsl:for-each select="cell">
        ...
      </xsl:for-each>
    </xsl:for-each>
    
    (can also use <sort>)
  • When we want to invoke a simple kind of debugging:
    <xsl:template select="//warning/*/para">
      <xsl:message>Template //warning/*/para is activated!</xsl:message>
      <xsl:apply-templates/>
    </xsl:template>
    
  • When the stylesheet would contain only one template, we can use the single template shortcut:
    <BOOK xsl:version="1.0"
           xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      ...
      <xsl:value-of .../> ...
    </BOOK>
    

1.5.28. XSLT Version 2.0

What's new in XSLT V2.0:
  • not 100% backward compatible with XSLT V1.0;
  • many terminological and other changes in the specification;
  • a transformation can produce multiple result trees;
  • support for XPath V2.0 and stronger data typing;
  • facilities are introduced for grouping of nodes;
  • creation of user-defined functions within the stylesheet, that can be called from XPath expressions;
  • improved sorting;
  • an XHTML output method has been added.

1.6. XSL

1.6.1. Introduction to XSL

Extensible Stylesheet Language (XSL) Version 1.0; W3C Recommendation October 2001
  • XSL is a markup language suitable for formatting material to screen and paper;
  • XSL is a powerful and complex language with 51 formatting object types, such as blocks, inline areas, lists, tables, dynamic features and links. Formatting objects are configured using some of the 231 properties also specified.
  • Currently, only limited support for typesetting with XSL exists; a common approach is to convert XSL documents to TeX, and from there to e.g. PDF or Postscript.

1.6.2. Formatting objects

  • XSL formatting instructions are XML elements containing the text they apply to
  • An XSL formatting instruction is called a formatting object;
  • The namespace of formatting objects is http://www.w3.org/1999/XSL/Format; it is commonly mapped to prefix fo
For example:
  • An HTML element:
    <P>A <B>bold</B> statement.</P>
    
  • An equivalent XSL element:
    <block>A <wrapper font-weight="bold">bold</wrapper> statement.</block>
    
    
  • An XSLT stylesheet implementing the transformation:
    
    <stylesheet version="1.0"
           xmlns="http://www.w3.org/1999/XSL/Transform"
           xmlns:fo="http://www.w3.org/1999/XSL/Format">
    
      <template select="P">
        <fo:block><apply-templates/></fo:block>
      </template>
    ...
    
    

1.6.3. Templates and content

  • The root element in a XSL document is <root>; it contains two major sections: templates and content
  • Templates specify the characteristics of pages to display or print
  • Content is enclosed in page sequences, each making reference to a template

<root>
  <layout-master-set>
    <simple-page-master master-name="front">
      <!-- TEMPLATE 1 -->
    </simple-page-master>
    <simple-page-master master-name="body">
      <!-- TEMPLATE 2 -->
    </simple-page-master>
  </layout-master-set>

  <page-sequence-master master-name="front">
      <!-- CONTENT 1 -->
  </page-sequence-master>

  <page-sequence-master master-name="body">
      <!-- CONTENT 2 -->
  </page-sequence-master>
</root>

1.6.4. An example


<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" >
  <fo:layout-master-set>
    <fo:simple-page-master master-name="only" page-height="29.7cm" 
        page-width="21cm" margin-top="1cm" margin-bottom="2cm" 
        margin-left="2.5cm" margin-right="2.5cm">
      <fo:region-body margin-top="3cm"/>
      <fo:region-before extent="3cm"/>
      <fo:region-after extent="1.5cm"/>
    </fo:simple-page-master>
  </fo:layout-master-set>
  <fo:page-sequence master-name="only" initial-page-number="1">
    <fo:static-content flow-name="xsl-region-before">
      <fo:block text-align="end" font-size="10pt" font-family="serif" 
          line-height="14pt">
XML Recommendation - p. 
         <fo:page-number/>
      </fo:block>
    </fo:static-content>
    <fo:flow flow-name="xsl-region-body">
      <fo:block font-size="18pt" font-family="sans-serif" 
          line-height="24pt" space-after.optimum="15pt" 
          background-color="blue" color="white" text-align="center" 
          padding-top="0pt"> 
Extensible Markup Language (XML) 1.0 
      </fo:block>
     <fo:block font-size="16pt" font-family="sans-serif" 
         line-height="20pt" space-before.optimum="10pt" 
         space-after.optimum="10pt" text-align="start" padding-top="0pt">
Abstract
     </fo:block>
     <fo:block font-size="12pt" font-family="sans-serif" 
         line-height="15pt" space-after.optimum="3pt" text-align="start">
The Extensible Markup Language (XML) is a subset of SGML that is
completely described in this document. Its goal is to enable generic
SGML to be served, received, and processed on the Web in the way that
is now possible with HTML. XML has been designed for ease of
implementation and for interoperability with both SGML and HTML. For
further information go to
       <fo:basic-link external-destination="normal.pdf">normal.pdf</fo:basic-link>
     </fo:block>
...

1.7. Other XLT Companion Recommendations

1.7.1. Other XML related recommendations

“The nice thing about standards is that there are so many of them.”
XML Information Set (Infoset)
A set of definitions for use in specifications that need to refer to the information in an XML document.
XML Linking Language (XLink)
A language that allows elements to be inserted into XML documents in order to create and describe sophisticated links between resources.
XML Pointer Language (XPointer)
XPath-based language to be used as a fragment identifier for any URI-reference that locates an XML resource; supports addressing into the internal structures of XML documents.
XML Query
XPath-based query language, designed to be broadly applicable across many types of XML data sources.
Simple API for XML (SAX)
SAX (not a W3C recommendation) was the first widely adopted API for XML in Java, and is a de facto standard; now there are versions for several other programming language environments.
Document Object Model (DOM)
A platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents.
XHTML
This specification defines XHTML 1.0, a reformulation of HTML 4 as an XML 1.0 application.