Foundational course at
ESSLII 2005
Annotation of Language Resources
Lecture II.
XML-Related Recommendations
Tomaž Erjavec
Department of Knowledge Technologies
Jožef Stefan Institute
Jamova 39
SI-1000 Ljubljana
Slovenia
This lecture discusses developments related to XML, in particular
XML Schemas, XML Namespaces, XPath, and the XML transformation
language, XSLT.
In this part we look at the following XML-related proposals:
- XML Namespaces
- XML Schemas
- XPath
- XSLT
- XSL
XML Namespaces: Motivation | ← ↑ → |
- A single XML document could usefully contain elements and
attributes ("markup vocabulary") that are defined for and used by
multiple software modules.
- Such documents pose problems of recognition and
collision. Software modules need to be able to recognise the tags and
attributes which they are designed to process, even in the face of
"collisions" occurring when markup intended for some other software
package uses the same element type or attribute name.
- Therefore document constructs should have universal names, whose
scope extends beyond their containing document; such universal names
are defined by the
XML
Namespaces specification (January 1999).
- Namespaces make use of the notion of a
Uniform Resource
Identifier, (URI), which identifies a resource by
meta-information of any kind; in contrast, an URL locates a resource
on the net, which means if you have a URL and the appropriate protocol
you can retrieve the resource.
<?xml version="1.0" ?>
<html:html
xmlns:html="http://www.w3.org/HTML/1998/html4"
xmlns:nms="http://www.names.net/address">
<html:head><html:title>Addresses</html:title></html:head>
<html:body>
<nms:addresses nms:version="1.0">
<html:hr/>
<nms:person xmlns:nms="http://www.names.net/address-addendum">
<nms:title>Mr.</nms:title>
<nms:first>Simon</nms:first>
<nms:last>Schuster</nms:last>
</nms:person>
<html:hr/>
<!-- ... -->
</html:body>
</html:html>
- XML Namespaces provide a two-part naming system for element
types and attributes
- The xmlns prefixed attributes give the URI and
the local prefix of the namespaces
- Qualified names consist of the prefix, colon, and local part of
the name
- The meaning of the prefix of qualified names is inherited - and
possibly overridden - by child elements
<?xml version="1.0" ?>
<html xmlns="http://www.w3.org/HTML/1998/html4"
xmlns:nms="http://www.names.net/address">
<head><title>Addresses</title></head>
<body xml:lang="en">
<nms:addresses nms:version="1.0">
<hr/>
<nms:person>
<nms:title>Mr.</nms:title>
<nms:first>Simon</nms:first>
<nms:last>Schuster</nms:last>
</nms:person>
<hr/>
<!-- ... -->
</body>
</html>
- The default namespace is introduced by the attribute
xmlns, without a local prefix
- The prefix xml is by definition bound to the namespace name
http://www.w3.org/XML/1998/namespace
There is less to XML Namespaces than meets the eye!
Document Type Definitions, DTDs are the traditional way in which to
declare document types and to validate SGML/XML documents. However,
they have two problems:
- DTDs can impose only weak constraints on attribute and element
content
- DTDs themselves are not written in XML, so tools to process
(edit, validate, present) XML do not work with them
Several proposals exist to address these shortcomings:
-
XML Schema
W3C Recommendation
-
RELAX NG
(Regular Language Description for XML -- Next Generation)
Based on TREX (James Clark) and RELAX (Murata Makoto)
Moving towards an ISO standard
-
Schematron
Rick Jelliffe, (Academia Sinica)
Moving towards an ISO standard (ISO CD)
Validators exist that implement all of the above proposals and also
convert DTDs to schemas.
W3C Schemas: an Example XML Document | ← ↑ → |
Example XML document from the W3C Schema Primer:
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<state>PA</state>
<zip>95819</zip>
</billTo>
<comment>Hurry, my lawn is going wild!</comment>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaseOrder>
XML Schemas: an Example Schema | ← ↑ → |
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
Features of RELAX NG are that it:
- is simple
- is easy to learn
- has both an XML syntax and a compact non-XML syntax
- supports XML namespaces
- treats attributes uniformly with elements so far as possible
- has unrestricted support for unordered content
- has unrestricted support for mixed content
- has a solid theoretical basis
- can partner with a separate datatyping language
(such W3C XML Schema Datatypes
- Example of a RELAX NG schema:
<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0">
<zeroOrMore>
<element name="card">
<attribute name="type"><text/></attribute>
<element name="name"><text/></element>
<element name="email"><text/></element>
<optional><element name="note"><text/></element></optional>
</element>
</zeroOrMore>
</element>
- Compact notation:
element addressBook {
element card {
attribute name { text },
element name { text },
element email { text },
element note { text }?
}*
}
Identity of XML Documents | ↑ |
When are two XML documents the same? | ← ↑ → |
<anthology> <anthology
<poem id = "p001" rend = "center"> ><poem rend='center'
<title>The SICK ROSE</title> id='p001'><title>The SICK ROSE</title>
<line>O Rose thou art sick.</line> <line>O Rose thou art sick.</line>
thou art sick.</line> </poem> <!--end of the poem-->
</poem></anthology> </anthology>
- Two XML documents are "the same", when they are logically
equivalent within an application context
- Differences that are irrelevant:
- Order of attributes and usage of quotes
- Non-significant whitespace (in content depends on presence of
DTD)
- Representation of characters:
ü,
ü,
ì,
ì,
ì
- Entity references v.s. their expansion (
ë,
SYSTEM
entities)
- Comments
- W3C Recommendation "Canonical XML" describes a method for
generating a physical representation, the canonical form, of an XML
document that accounts for the permissible changes
- XML canonicalization is defined in terms of the XPath definition
of a node-set
- W3C Recommendation "XML Information Set" defines an abstract
data model called the XML Information Set (Infoset).
- Its purpose is to provide definitions for use in other
specifications that need to refer to the information in a well-formed
XML document
- XML document's information set (a tree) consists of a number of
information items (nodes); each information item has a set of
associated named properties.
- Information items are verly similar to nodes in XPath
Formatting and Transforming XML | ↑ |
This part of the course deals with the XSL family of W3C
recommendations, in particular XPath, XSLT, and XSL. The structure and
examples follow the book:
Neil Bradley: The XSL Companion. Addison-Wesley, 2000.
Formatting and Transforming XML: Introduction | ← ↑ → |
XML markup is supposed to be
descriptive
(e.g. <
title>) rather than presentational
(e.g. <
bold>). But, sooner or later, we do want to render the
documents. How do we do this?
- rendering built directly into software (e.g. HTML browsers)
- direct conversion to output format with XML aware transformation
software (e.g. with XSLT to HTML)
- conversion to intermediary, abstract presentation oriented format,
and from there to final output format (e.g. with XSLT to XSL to PDF)
Styling languages:
- HTML: CCS (Cascading Style Sheets)
- SGML: DSSSL (Document Style Semantics and Specification Language)
- XML: XSL (eXtensible Stylesheet Language)
The proposal for a stylesheet language originally named XSL was
proposed to the W3C in 1997. But during its gestation, the proposal
was pulled apart into three separate standards:
- XPath
(V1.0, November 1999)
- defines a mechanism for locating information in XML documents,
and has many other uses besides that in formatting documents
- XSLT
(V1.0, November 1999)
- defines a means of transforming XML documents into other data
formats (XML or otherwise), including (but not limited to)
formatting languages
- XSL
(V1.0, October 2001)
- is now properly used only to name a proposed standard for
embedding formatting information in documents using XML elements.
XML Path Language (XPath)
Version 1.0, W3C Recommendation November 1999
(Version 2.0, W3C Working Draft 30 April 2002).
- The primary purpose of XPath is to address parts of an XML
document;
however, it has a natural subset that can be used for testing whether
or not a node matches a pattern.
- XPath uses a compact, non-XML syntax; it gets its name from its
use of a path notation as in URLs for navigating through the
hierarchical structure of an XML document.
XPath: Introduction 2 | ← ↑ → |
- XPath operates on the abstract, logical structure of an XML
document (its InfoSet), rather than its surface syntax;
it models an XML document as a tree of nodes. There are different
types of nodes, including element nodes, attribute nodes and text
nodes.
- The primary syntactic construct in XPath is the expression,
which is evaluated to yield an object of type: node-set (an unordered
collection of nodes without duplicates), boolean, number, or string.
- The full syntax of XPath expressions is cumbersome,
so various abbreviations are allowed.
- Expression evaluation occurs with respect to its context
node.
- *
- selects all element children of the
context node
- @name
- selects the attribute name of
the context node
- @*
- selects all the attributes of the
context node
- para[1]
- selects the first para child of
the context node
- */para
- selects all para grandchildren of
the context node
- /doc/chapter[5]/section[2]
- selects the second
section of the fifth chapter of the
doc child of the root node
- //para
- selects all the para descendants of
the document root and thus selects all para elements in the
same document as the context node
- //olist/item
- selects all the item
elements in the same document as the context node that have an
olist parent
- .
- selects the context node
- .//para
- selects the para element
descendants of the context node
- ..
- selects the parent of the context node
- ../@lang
- selects the lang attribute
of the parent of the context node
- para[@type="warning"][5]
- selects the fifth
para child of the context node that has a type
attribute with value warning
- para[5][@type="warning"]
- selects the fifth
para child of the context node if that child has a
type attribute with value warning
- chapter[title="Introduction"]
- selects the
chapter children of the context node that have one or
more title children with string-value equal to
Introduction
- chapter[title]
- selects the chapter
children of the context node that have one or more title
children
- employee[@secretary and @assistant]
- selects all
the employee children of the context node that have both a
secretary attribute and an assistant
attribute
XPath Expressions, cont. | ← ↑ → |
axis-name :: node-test [predicate]*
- Some axis names:
child,
parent,
descendant,
ancestor,
self,
ancestor-or-self,
following-sibling,
following,
attribute,
...
- The ancestor, descendant,
following, preceding and self
axes partition a document (ignoring attribute and namespace nodes):
they do not overlap and together they contain all the nodes in the
document.
- Some node tests:
literal name,
*,
text(),
- A predicate is an expression; it is composed of values,
operators and other XPath expressions. XPath also defines a set of
functions for use in predicates.
Location steps are similar to file system addressing
(
child:: axes below omitted):
- A
- selects all elements A that are
children of the context node
- A/B
- select B elements that are
children of A
- A//B
- select B elements that are
descendants of A
- /A
- select root element A
- /A//B
- select B elements that are
descendants of root element A
XPath also defines a number of functions. Here are some examples of
their use in expressions:
- child::text()
- selects all text node children of the
context node
- child::para[last()]
- selects the last para child
of the context node
- child::para[position()=1]
- selects the first
para child of
the context node
Abbreviated Syntax Expressions | ← ↑ → |
Full syntax is cumbersome, so various abbreviations are allowed:
- child:: can be omitted from a location step
- attribute:: can be abbreviated to @
- position()=n can be abbreviated to n
- etc.
For example,
child::para[position()=1][attribute::type="warning"] can
be abbreviated to
para[1][@type="warning"]
XPath String Functions | ← ↑ → |
A number of functions are also defined for strings; some examples
are:
- concat(string, string, string*)
- returns the concatenation of its arguments.
- contains(string, string)
- returns true if the first argument string contains the second
argument string, and otherwise returns false. For example
contains("abc", "b")
- substring-before(string, string)
- returns the substring of the first argument string that precedes
the first occurrence of the second argument string in the first
argument string, or the empty string if the first argument string does
not contain the second argument string. For example,
substring-before("1999/04/01","/") returns "1999".
- normalize-space(string?)
- returns the argument string with whitespace normalised by
stripping leading and trailing whitespace and replacing sequences of
whitespace characters by a single space. For example
normalize-space(" a b c ") returns "a b
c"
etc.
XSLT defines a means of transforming XML documents into other data
formats. It can be used for:
- transforming XML documents into documents using the XSL
stylesheet language, i.e. for formatting XML documents
- as a general XML transformation language, used to
transmit data between applications
- at present, the most popular use is to convert XML into HTML.
Processors:
- XSLT processor converts XML input into XML/XSL output, when
supplied with a XSLT stylesheet
- XSL processor convert XSL documents into device-dependent output
formats.
How do we specify an XSLT stylesheet?
- It is an XML document, and the root element is
<xsl:stylesheet> ... </xsl:stylesheet>
- To indicate that XSLT can be used for more than just styling, an
equivalent element is:
<xsl:transform> ... </xsl:transform>
- Typical form:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
.
.
.
</xsl:stylesheet>
Invoking Stylesheets | ← ↑ → |
There are three ways of invoking stylesheets:
- Unlinked stylesheet:
xsltproc tlslides.xsl esslli05.xml > esslli05.html
- Referenced stylesheet:
<?xml-stylesheet type="text/xsl" href="tlslides.xsl"?>
<!DOCTYPE TEI.2 SYSTEM 'teixlite.dtd'>
- Embedded stylesheet:
<?xml-stylesheet type="text/xsl" href="#localStyle"?>
<!DOCTYPE TEI.2 SYSTEM 'teixlite.dtd'>
<TEI.2>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
id="localStyle">
...
</xsl:stylesheet>
...
</TEI.2>
Stylesheets are composed (mostly) of templates.
A typicaly template could look like this:
<template match="para">
<apply-templates/>
</template>
- A template specifies the transformation that is to be applied to
a specific part of the source document
- The match attribute, whose value is a pattern (a
subset of XPath expressions), specifies which node the template
applies to; matching is done with reference to the context node
- <apply-templates/> triggers processing of the children
nodes of the context node
Selective and Repeated Processing | ← ↑ → |
Prefixes and Suffixes | ← ↑ → |
- Text can be inserted before and after the node:
<template match="chapter">
This text will appear before the content of chapter
<apply-templates/>
This text will appear after the content of chapter
</template>
- Using the <text> element allows for better control of whitespace:
<template match="quote">
<text> "</text>
<apply-templates/>
<text>" </text>
</template>
<template match="lb">
<text> </text>
</template>
Tag Replacement and Namespaces | ← ↑ → |
Because the main purpose of XSLT is to convert to XML or HTML,
replacing or inserting tags is very common. This can be done in two
different ways:
- Using the XSLT <element>:
<template match="book">
<element name="HTML">
<element name="HEAD">
<element name="TITLE">The Title</element>
</element>
<element name="BODY">
<apply-templates/>
</element>
</element>
</template>
- Using XML Namespaces:
<xsl:template match="book">
<HTML>
<HEAD><TITLE>The Title</TITLE></HEAD>
<BODY>
<xsl:apply-templates/>
</BODY>
<HTML>
</xsl:template>
Input:
<para>Hello <hi>world</hi>!</para>
- Copying document fragments:
<xsl:template match="para">
<xsl:copy-of select=".">
</xsl:template>
Output: Hello <hi>world</hi>!
- Accessing content element as a string:
<xsl:template match="para">
<xsl:value-of select=".">
</xsl:template>
Output: Hello world!
- Accessing specific elements:
<xsl:template match="para">
<xsl:value-of select="hi"/>
</xsl:template>
Output: world
Breaking Well-Formedness | ← ↑ → |
XSLT stylesheet must be a well-formed XML document, and it outputs
also only well-formed XML documents. This can sometimes be
problematic:
Input:
<first>John</first> <last>Smith</last>
<first>Frank</first> <last>Furter</last>
Intended output:
<p>John Smith</p>
<p>Frank Furter</p>
First try:
<xsl:template match="first">
<P> <xsl:value-of select="."/> <!-- WRONG! -->
</xsl:template>
<xsl:template match="last">
<xsl:value-of select="."/> </P> <!-- WRONG! -->
</xsl:template>
<xsl:template match="first">
<p> <xsl:value-of select="."/> <!-- Escape <p> -->
</xsl:template>
<xsl:template match="last">
<xsl:value-of select="."/> </p> <!-- Escape </p> -->
</xsl:template>
However, this doesn't work:
<p> John Smith </p>
<p> Frank Furter </p>
Disable Output Escaping | ← ↑ → |
Output escaping can be disabled:
<xsl:template match="first">
<xsl:text disable-output-escaping="yes"><p></xsl:text>
<xsl:value-of select="."/>
</xsl:template>
<xsl:template match="last">
<xsl:value-of select="."/>
<xsl:text disable-output-escaping="yes"></p></xsl:text>
</xsl:template>
Input:
<first>John</first> <last>Smith</last>
<first>Frank</first> <last>Furter</last>
Output:
<p>John Smith</p>
<p>Frank Furter</p>
The perceived need to D-O-E usually comes from thinking about the
transformation on a wrong way; XSLT is not about writing out start and
end tags in a linear stream, but about transforming one tree structure
into another.
So, a better way of implementing the transformation would be:
...
<xsl:apply-templates select="first"/>
...
<xsl:template match="first">
<P>
<xsl:apply-templates/>
<xsl:apply-templates select="following-sibling::last[1]"/>
</P>
</xsl:template>
These elements are children of <
stylesheet>:
- External definitions:
<xsl:import href="tbl1.xsl"/> <!--first element, included at end-->
<xsl:include href="tbl2.xsl"/> <!--included here-->
- Output specification:
<xsl:output
method="xml"
version="1.0"
encoding="ISO-8859-1"
standalone="no"
doctype-system="tei2.dtd"
doctype-public="-//TEI P3//DTD Main Document Type//EN"
indent="yes"
cdata-section-elements="code eg"
media-type="text/xml"/>
- Treatment of element whitespace:
<xsl:preserve-space elements="head p"/>
<xsl:strip-space elements="div"/>
- Also some others, in particular, <template>
Contextual Formatting | ← ↑ → |
Templates can be sensitive to the context of an element:
- Trivial case:
<xsl:template match="div"> ... </xsl:template>
<xsl:template match="head"> ... </xsl:template>
<xsl:template match="p"> ... </xsl:template>
- Specific ancestor, child and sibling:
<xsl:template match="A//X"> ... </xsl:template>
<xsl:template match="X[C]"> ... </xsl:template>
<xsl:template match="P[S]/X"> ... </xsl:template>
- Specific attribute and attribute value:
<xsl:template match="X[@a]"> ... </xsl:template>
<xsl:template match="X[@a='v']"> ... </xsl:template>
Only one XSLT template can apply to a specific instance of an
element. If two or more templates match an element, the conflict is
resolved using template priority:
Same content can be invoked in various parts of the stylesheet, but
needs to be formatted differently. This
context dependent behaviour can be achieved using
modes:
<xsl:template match="title">
<!-- formatting for <title> in the body of document -->
</xsl:template>
<xsl:template match="title" mode="toc">
<!-- formatting for <title> in the table of contents -->
</xsl:template>
...
<!-- Generating the table of contents: -->
<xsl:apply-templates mode="toc" select="//title"/>
Use of the XSL element <
attribute>:
Input:
<image file="house.jpg" x="100" y="100">My house</image>
Output:
<IMG SRC="house.jpg" HEIGHT="100" WIDTH="100" ALT="My house"/>
Template:
<xsl:template match="image">
<xsl:element name="IMG">
<xsl:attribute name="SRC"><value-of select="@name"/></xsl:attribute>
<xsl:attribute name="HEIGHT"><value-of select="@x"/></xsl:attribute>
<xsl:attribute name="WIDTH"><value-of select="@y"/></xsl:attribute>
<xsl:attribute name="ALT"><value-of select="."/></xsl:attribute>
</xsl:element>
</xsl:template>
Alternativelly, the shorthand curly brackets can be used:
<xsl:template match="image">
<IMG SRC="{@name}" HEIGHT="{@y}" WIDTH="{@x}" ALT="{text()}"/>
</xsl:template>
Conditional Constructs | ← ↑ → |
XSLT provides two elements to select optional pieces of a template:
- If statement:
<xsl:if test="not(position() = last)">
<xsl:text>, </xsl:text>
</xsl:if>
- Multiple choices:
<xsl:choose>
<xsl:when test="@type='error'">
<FONT color="red"><xsl:apply-templates/></FONT>
</xsl:when>
<xsl:when test="@type='warning'">
<FONT color="yellow"><xsl:apply-templates/></FONT>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates/>
</xsl:otherwise>
</xsl:choose>
Sorting specified in <
apply-templates>:
Input:
<people>
<name>
<first>John</first>
<last>Smith</last>
<age>53</age>
</name>
<name>
<first>Frank</first>
<last>Furter</last>
<age>35</age>
</name>
</people>
Template:
<xsl:template match="people">
<DIV>
<xsl:apply-templates>
<xsl:sort select="last">
<xsl:sort data-type="number" select="age">
</xsl:apply-templates>
</DIV>
</xsl:template>
Numbering is specified in <
template> with <
number>:
- The number is the position of the element within its list of
sibling elements:
Input: Output:
<people>
<name>John Smith</name> 1) John Smith
<name>Frank Furter</name> 2) Frank Furter
</people>
Template:
<xsl:template match="name">
<xsl:number/> <xsl:text>) </xsl:text>
<xsl:apply-templates/>
</xsl:template>
- Formatting:
Template:
<xsl:number format="a"/>. <xsl:apply-templates/>
Output:
a. John Smith
b. Frank Furter
Template:
<xsl:number format="(i)"/> <xsl:apply-templates/>
Output:
(i) John Smith
(ii) Frank Furter
XML IDs can be used in XSLT:
- The ID attribute must be declared in DTD:
<!ATTLIST div name ID #REQUIRED>
- IDs are accessed with the id() XPath function:
<xsl:template match="id('chp-intro')"> ... </xsl:template>
- The argument of id() is #IDREFS:
<xsl:template match="id('chp-intro chp-conc')"> ... </xsl:template>
A more flexible method is possible by using
keys:
Variables are named objects that hold values.
- Defining the value:
<xsl:variable name="color">red</xsl:variable>
- XSLT is a declarative language, so a value of a variable
cannot be changed:
<xsl:variable name="n">1</xsl:variable>
<xsl:variable name="n">2</xsl:variable> <!-- WRONG! -->
- However, a variable definition in a template overrides one
made globally:
<xsl:stylesheet ...>
<xsl:variable name="level">1</xsl:variable>
<xsl:template match=".*">
<xsl:variable name="level">2</xsl:variable> <!-- OK -->
- Variables are referenced by prefixing their name with $:
The sky was
<FONT color="{$color}"><xsl:value-of select="$color"></FONT>
- Variables can also contain result-tree fragments:
<xsl:variable name="warning"><hi>Warning!</hi></xsl:variable>
...
<xsl:copy-of select="$warning">
Repetitive output structures can be encapsulated in
named templates.
- A named template is defined in the usual way but it is given a name:
<xsl:template name="line">
<BR/><HR/><BR/>
</xsl:template>
- It is invoked by <call-template>:
<xsl:template select="chapter">
<xsl:call-template name="line">
<H1>New chapter</H1> <xsl:apply-templates/>
</xsl:template>
- Templates can have parameters:
<xsl:template name="colorize">
<xsl:param name="color">white</xsl:param>
<FONT color="{$color}"> <xsl:apply-templates/>
</FONT>
</xsl:template>
<xsl:template select="error">
<xsl:call-template name="colorize">
<xsl:with-param name="color">red</xsl:with-param>
</xsl:call-template>
</xsl:template>
<xsl:template select="warning">
<xsl:call-template name="colorize">
<xsl:with-param name="color">yellow</xsl:with-param>
</xsl:call-template>
</xsl:template>
- When the stylesheet would contain only one template, we can use
the single template shortcut:
<BOOK xsl:version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
...
<xsl:value-of .../> ...
</BOOK>
- When we want to loop through elements directly,
we can use direct processing:
<xsl:for-each select="row">
...
<xsl:for-each select="cell">
...
</xsl:for-each>
</xsl:for-each>
- When we want to invoke a simple kind of debugging:
<xsl:template select="//warning/*/para">
<xsl:message>Template //warning/*/para is activated!</xsl:message>
<xsl:apply-templates/>
</xsl:template>
What's new in XSLT V2.0:
- not 100% backward compatible with XSLT V1.0;
- many terminological and other changes in the
specification;
- a transformation can produce multiple result trees;
- support for XPath V2.0 and stronger data typing;
- facilities are introduced for grouping of nodes;
- creation of user-defined functions within the stylesheet, that
can be called from XPath expressions;
- improved sorting;
- an XHTML output method has been added.
- XSL is a markup language suitable for formatting material to
screen and paper;
- XSL is a powerful and complex language with 51 formatting
object types, such as blocks, inline areas, lists, tables,
dynamic features and links.
Formatting objects are configured using some of the 231
properties also specified.
- Currently, only limited support for typesetting with XSL exists;
a common approach is to convert XSL documents to TeX, and from there
to e.g. PDF or Postscript.
- XSL formatting instructions are XML elements containing the text
they apply to
- An XSL formatting instruction is called a formatting
object;
- The namespace of formatting objects is
http://www.w3.org/1999/XSL/Format; it is commonly
mapped to prefix fo
For example:
- An HTML element:
<P>A <B>bold</B> statement.</P>
- An equivalent XSL element:
<block>A <wrapper font-weight="bold">bold</wrapper> statement.</block>
- An XSLT stylesheet implementing the transformation:
<stylesheet version="1.0"
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<template select="P">
<fo:block><apply-templates/></fo:block>
</template>
...
Templates and Content | ← ↑ → |
- The root element in a XSL document is <root>; it contains
two major sections: templates and content
- Templates specify the characteristics of pages to display or print
- Content is enclosed in page sequences, each making reference to
a template
<root>
<layout-master-set>
<simple-page-master master-name="front">
<!-- TEMPLATE 1 -->
</simple-page-master>
<simple-page-master master-name="body">
<!-- TEMPLATE 2 -->
</simple-page-master>
</layout-master-set>
<page-sequence-master master-name="front">
<!-- CONTENT 1 -->
</page-sequence-master>
<page-sequence-master master-name="body">
<!-- CONTENT 2 -->
</page-sequence-master>
</root>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<fo:layout-master-set>
<fo:simple-page-master master-name="only" page-height="29.7cm"
page-width="21cm" margin-top="1cm" margin-bottom="2cm"
margin-left="2.5cm" margin-right="2.5cm">
<fo:region-body margin-top="3cm"/>
<fo:region-before extent="3cm"/>
<fo:region-after extent="1.5cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-name="only" initial-page-number="1">
<fo:static-content flow-name="xsl-region-before">
<fo:block text-align="end" font-size="10pt" font-family="serif"
line-height="14pt">
XML Recommendation - p.
<fo:page-number/>
</fo:block>
</fo:static-content>
<fo:flow flow-name="xsl-region-body">
<fo:block font-size="18pt" font-family="sans-serif"
line-height="24pt" space-after.optimum="15pt"
background-color="blue" color="white" text-align="center"
padding-top="0pt">
Extensible Markup Language (XML) 1.0
</fo:block>
<fo:block font-size="16pt" font-family="sans-serif"
line-height="20pt" space-before.optimum="10pt"
space-after.optimum="10pt" text-align="start" padding-top="0pt">
Abstract
</fo:block>
<fo:block font-size="12pt" font-family="sans-serif"
line-height="15pt" space-after.optimum="3pt" text-align="start">
The Extensible Markup Language (XML) is a subset of SGML that is
completely described in this document. Its goal is to enable generic
SGML to be served, received, and processed on the Web in the way that
is now possible with HTML. XML has been designed for ease of
implementation and for interoperability with both SGML and HTML. For
further information go to
<fo:basic-link external-destination="normal.pdf">normal.pdf</fo:basic-link>
</fo:block>
...
Other XLT Companion Recommendations | ↑ |
Other XML Related Recommendations | ← ↑ → |
“The nice thing about standards is that there are so many of them.”
- XML Information
Set (Infoset)
- A set of definitions for use in specifications that need to
refer to the information in an XML document.
- XML Linking Language (XLink)
- A language that allows elements to be inserted into XML
documents in order to create and describe sophisticated links between
resources.
- XML Pointer Language (XPointer)
- XPath-based language to be used as a fragment identifier for
any URI-reference that locates an XML resource;
supports addressing into the internal structures of XML documents.
- XML Query
- XPath-based query language, designed to be broadly applicable across many
types of XML data sources.
- Simple API for XML (SAX)
- SAX (not a W3C recommendation) was the first widely adopted API
for XML in Java, and is a de facto standard; now there are versions
for several other programming language environments.
- Document Object Model (DOM)
- A platform- and language-neutral interface that will allow
programs and scripts to dynamically access and update the content,
structure and style of documents.
- XHTML
- This specification defines XHTML 1.0, a reformulation of HTML 4
as an XML 1.0 application.