Level: Advanced Jared Jackson (jjared@almaden.ibm.com), Research Associate, IBM
01 Apr 2002 The combined power of XML and XSL for representing, manipulating, and presenting data over the Web and sharing data across differing applications has been clearly demonstrated through the fast acceptance and broad usage of these technologies. Still, most developers familiar with the basics of XML and XSL are not utilizing this power fully. This article shows developers how to use extensions, a technique that allows you to expand the capabilities of XSL. In terms both of power and simplicity, the combination of XML and XSL has revolutionized
data storage and manipulation in a way not seen since the early
days of the SQL database language. XML provides a clear and independent
way of recoding data that is easily shared and understood. Similarly, many people
feel that XSL is also easy to read, write, and understand. Clearly, this powerful duo
are essential knowledge for everyone involved in the technology industry.
The broad scope and small learning curve associated with the
basic elements of XSL transformation sometimes acts as a double-edged
sword -- yielding broad usage of the core technology but dissuading the
majority of developers learning XSL from investigating and using its more
advanced and powerful features.
This article is written for developers who already have a basic understanding
of XML and XSL, and are ready to build on this knowledge. If you are unfamiliar
with these technologies, you can find several good introductory articles and tutorials
on developerWorks and other Web sites. The article shows you how to use extensions -- a technique present in most XSL processors -- which allows virtually unlimited expansion of the existing capabilities of XSL's core
features. This article includes a general description of how to write extensions with code,
followed by three specific and widely applicable examples.
What are XSL extensions?
It must first be understood that XSL, like all other programing languages,
is merely a grammar specification in need of an implementation. Fortunately,
XSL has become very popular and there are several implementations to choose
from. Extensions are not a required feature of the grammar and, thus, their syntax
is not as well defined as the other constructs of the language. They are, however,
now included in the W3C's XSLT Recommendation. The examples in
this article will follow the format of that recommendation.
Simply put, extensions are a way of calling a method written in some other programming
language from within an XSL document. Usually, the extension methods
are written in the same language as that of the XSL processor. There are
exceptions to this rule: Java, for example, can be made to run programs in other
languages such as Javascript or Perl. Thus it is possible to write extensions
in XSL in Javascript, Perl, or some other language and make use of them through
a Java-based XSL processor.
What makes these extensions so significant when XSL can already do so much?
What XSL gains in simplicity and broad ability for transformation is often
lost in efficiency and ability to do anything unrelated to transformation.
For instance, suppose you have an XML document that lists 5,000 users of your
system. The user name, real name, and e-mail address of each of these users is
given under a Users node within the XML. You later append
to the XML document an Interests node in a separate subtree
of the XML with user names grouped by particular interests such as acrobatics,
bicycling, computers. You hope eventually to transform the data into an HTML
page that groups users by interests and presents e-mail contacts for people of
similar interests. XSL can do this handily with the following code:
Listing 1. User interest XSL transformation without extensions
<xsl:for-each select="Interests/Interest">
<b><xsl:value-of select="@InterestName"/></b>
<ul>
<xsl:for-each select="User">
<xsl:variable name="userName" select="@userName"/>
<xsl:variable name="userNode" select="/Root/Users/User[@userName =
$userName]"/>
<li>
<xsl:value-of select="$userNode/@realName"/>
<xsl:value-of select="concat(' ',$userName/@email"/>
</li>
</xsl:for-each>
</ul>
</xsl:for-each>
|
Unfortunately, the way the transform executes, the entire list of 5,000 users
will be examined for each user in each interest category. This is far
more work than you want your server to do for each request to this
Web page.
Extensions provide a convenient way around this and several other possible
hang-ups that you may encounter when using XSL on nontrivial data sets. In the above
example, a simple hashmap or binary search tree could have easily solved the
problem, but implementing one of these data structures in XSL would be inconvenient and
unnecessary. Extensions to a language that has more appropriate
data types will more easily fix the problem. (Incidentally, the code for this
fix is given in the first example below).
Technologies used in this article
It would be a daunting task to list all of the XSL processors and their
methods for implementing extensions. This article uses the Java version
of Xalan -- a popular and freely available XSL processor from the Apache Project --
to describe the specifics of writing extensions. All of the
examples are targeted to that platform. (Xerces, another Apache product, is
used as the XML parser. You can download Xalan and Xerces from links in Resources.)
Most other popular XSL implementations
also provide a mechanism for extensions, but you'll need to consult their documentation
to find any differences in approach.
To simplify working with XML and XSL, I have also provided Java code for some
of the more common XML manipulations. This code, along with the code and data
necessary to run all of the examples, is provided in a zip file in Resources.
This file does not, however, include external libraries such as Xalan and
Xerces. After you obtain those libraries by following links in Resources (versions: Xalan - Java 2.3.1; Xerces 1.4.4),
place their jar files in the lib directory extracted from the zip file. For those readers
who wish to jump directly to the examples, all Java code is in the src
directory, XML data in the XML directory, XSL transforms in the XSL
directory, batch files in the bin directory, and compiled code in the lib directory.
Creating an extension
In order to call a method from XSL, that method must first be written and its
compiled form placed in the classpath of the application that is performing
the XSL transformation. Methods may be of your own design, supplied by the standard
libraries of Java, or taken from other Java libraries. In some XSL processors,
like Xalan, there are even extension methods written directly into the processor.
The first thing to be aware of when you write or use these methods is the mapping
of data types from XSL to Java and back again. The following table provides
a reference to these mappings in Xalan.
Tables 1,2. Data Type Mappings
|
Parameter Mapping
| | XSLT Type | Java Type | | Node Set | org.w3c.dom.traversal.NodeIterator | | String | java.lang.String | | Boolean | java.lang.Boolean | | Number | java.lang.Double | | Result Tree Fragment | org.w3c.dom.DocumentFragment |
|
|
Return Type Mapping
| | Java Type | XSLT Type | | org.w3c.dom.traversal.NodeIterator org.apache.xml.dtm.DTM
org.apache.xml.dtm.DTMAxisIterator org.apache.xml.dtm.DTMIterator
org.w3c.dom.Node | Node Set | | java.lang.String | String | | java.lang.Boolean | Boolean | | java.lang.Number | Number | | org.w3c.dom.DocumentFragment | Result Tree Fragment |
|
Once your methods are written, incorporating them into XSL is fairly simple.
The first step is to declare a namespace for your methods in the <xsl:stylesheet> element. For example, if you want to run methods from a class called foo in package com.myCompany.XSLExtensions, the root of your XSL file would contain the following line:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:extension="xalan://com.myCompany.XSLExtensions.foo"/>
If you later want to call a method from the class you have declared, use the namespace
declared in the <xsl:stylesheet> element. Continuing the example, in order
to run a method called bar() that takes a String as a parameter and returns a String, you might use code like the following:
<xsl:variable name="myParam" select="'theParameter'"/>
<xsl:variable name="myResult" select="extension:bar($myParam)"/>
It's that simple. The myResult variable now contains the result of calling
bar from your Java class. To obtain a better grasp on the technique, work
through the following three examples.
Example 1: Lookup tables
The beginning of this article presented a scenario in which the use of standard
XSL techniques for looking up data in distinct subtrees of an XML document used
excessive amounts of compute time. A simple way around this is to
create a general purpose hashtable that provides a mechanism for storing and
retrieving strings. Since hashtables are built directly into the standard Java
libraries, writing an extension that uses them should be painless.
The hashtable Java code is found in the src/StringHash.java file
contained in the zip file in Resources. It has two methods of note:
addString(String tableName, String key, String value)
getString(String tableName, String key)
The first method allows the creation of hashtables associated with a table
name plus the insertion of string values mapped to a key. The second method
provides a means for retrieving the stored values.
An XML data source is found in the XML/user_interests.xml file
(see the zip file in Resources). It follows the form:
Listing 2. User interest XML fragment
<Users>
<User userName="aragon" realName="Aragon"
email="aragon@middleEarth.fict"/>
<User userName="boromir" realName="Boromir"
email="boromir@middleEarth.fict"/>
...
</Users>
<Interests>
<Interest name="archery">
<User userName="legolas"/>
...
</Interest>
...
</Interests>
|
Two XSL files are given in the zip file in Resources for producing the Web page result. The first is found in the XSL/user_interests_xsl_only.xsl file and follows the code shown
in Listing 1. The second is found in the XSL/user_interests_extensions.xsl file
which modifies the former XSL file to the code shown in Listing 3.
To easily run the XSL conversion on Windows, use the bin/Example_1*.bat batch files. Unix and Mac developers should have little trouble running the examples after examining these batch files.
Listing 3. User interest XSL transformation with extensions
<xsl:stylesheet xmlns:lookup="xalan://StringHash">
...
<xsl:for-each select="Users/User">
<xsl:value-of select="lookup:addString('realName', string(@userName),
string(@realName))"/>
<xsl:value-of select="lookup:addString('email', string(@userName),
string(@email))"/>
</xsl:for-each>
...
<li>
<xsl:value-of select="lookup:getString('realName',$userName)"/>
<xsl:value-of select="concat(' - ',lookup:getString('email',
$userName))"/>
</li>
|
Example 2: Regular expressions
The current XSL standard uses the XPath technology to perform all of its pattern
matching. While XPath provides a compact and elegant way of traversing an XML
tree, its pattern matching functions have a rather limited capability. (The entirety of the string functions in XPath that performs boolean matching is: starts-with(), ends-with(), and contains(). You can also automatically parse strings into numbers.)
Regular expressions provide much
richer pattern matching across strings of text, but are as easy to use as XPath when traversing
a data structure such as an XML tree. For more detailed information on regular expressions, see Resources.
The optimum solution is to combine the two technologies. The next version of
the XSL transformation language, which is still under development and review,
includes a proposal to add regular expressions to the language.
For developers who want to use the technology now, extensions
provide the mechanism for doing so.
The source code for the Java methods accessed as extensions can be found in
the src/PatternMatcher.java file contained in the zip file accompanying
this article. These methods make use of external code that is not contained
within the standard Java libraries, thus this example also shows what steps
are necessary to link external jar files for use in extensions. You will need to obtain he regular expression
jar file provided by GNU (see Resources) and place it in the extracted lib directory,
in order to get the examples to work. Feel free to find another regular expression
package and modify the code to fit it.
For the second example, suppose you wish to generate a list of users from the
original source, for which the first and last names of those users are known.
While this is a fairly trivial example, it is not difficult to imagine more complicated
examples working on groups of users, product catalogs, or reference databases.
A simple way to do this is to look through the real names of the
users and match those names which consist of one name followed by a space followed
by another name. The regular expression for this is
\w* \w
.
The XSL now contains the lines in Listing 4.
Listing 4. Regular expressions in XSL
<xsl:stylesheet xmlns:regexp="xalan://PatternMatcher">
...
<ul>
<xsl:for-each select="Users/User[regexp:containsMatch('\w* \w*',
string(@realName))]">
<li>
<xsl:value-of select="@realName"/>
</li>
</xsl:for-each>
</ul>
|
Similar to Example 1, this example can be executed through the bin/Example_2.bat
file. You can find the XSL file used at XSL/user_last_names.xsl.
The possibilites for extension on this technique are infinite.
Example 3: Internationalization
Internationalization, sometimes referred to as localization or natural language
support, is the method by which developers make their products readable across
languages and cultures. It is particularly important in the context of XML translation
if the product of the transformation is a set of Web pages that targets a broad
audience. While topic of internationalization is too broad to introduce in a comprehensive way in the
context of this example, you can find good treatment of it in other developerWorks articles referenced below.
This example makes use of Java's built-in technique of handling internationalization
through the use of resource bundles. If you are unfamiliar with the topic, I encourage
you to read the referenced articles. Suffice it to say for now that resource bundles
consist of a collection of files that contain translations for different regions
or, more precisely, locales. Web servers can read the preferred locale of a
user when that user requests a Web page and, using these resource
bundles, can respond appropriately. XML-based applications can also target results
to a specific locale.
The potential uses of the code in this example are just as wide and varied
as the previous one. In order to demonstrate the technology, the code executed
by the bin/Example_3.bat file creates three Web pages from the sample
XML users data. The three resulting pages represent the same view of the data,
but are presented in three different languages. The translations used can be
found in properties files in the lib directory extracted from the zip file.
Conclusion
Even when considering the most basic components of XSL transformations, their capabilities are remarkable. When this core is extended with extensions to
encompass the power of modern programming languages, the possibilities become
virtually limitless. The ideas and examples presented above are but the tip
of the iceberg, and I leave it to you, after gaining an undestanding of
what is presented here, to explore the many remaining possibilites.
Download | Description | Name | Size | Download method |
|---|
| Source code for this article | x-callbk/XSL_Callbacks_Code.zip | 1588 KB | HTTP |
|---|
Resources - Download the zip file containing all of the code related to this article.
- Download Xerces (XML parser and DOM implementation) and Xalan (XSL transformer) from the Apache XML Project.
- Read articles and explore tutorials on XML/XSL:
- Get answers to your questions about regular expressions with this guide.
- For more on internationalization and resource bundles, read my developerWorks article "Harnessing internationalization."
- Finally, take a look at IBM WebSphere Studio Application Developer, an easy-to-use, integrated development environment for building, testing, and deploying J2EE applications, including generating XML documents from DTDs and schemas.
About the author  | 
|  |
Jared Jackson is a Researcher at IBM's Almaden Research Center. He works in the area of Web-based technologies. You can contact Jared at jjared@almaden.ibm.com.
|
Rate this page
|