Skip to main content

skip to main content

developerWorks  >  XML  >

Expand XSL with extensions

Technique helps expand the capabilities of XSL's core features

developerWorks
Document options

Document options requiring JavaScript are not displayed

Sample code


Rate this page

Help us improve this content


Level: Advanced

Jared Jackson (jjared@almaden.ibm.com), Research Associate, IBM

01 Apr 2002

The combined power of XML and XSL for representing, manipulating, and presenting data over the Web and sharing data across differing applications has been clearly demonstrated through the fast acceptance and broad usage of these technologies. Still, most developers familiar with the basics of XML and XSL are not utilizing this power fully. This article shows developers how to use extensions, a technique that allows you to expand the capabilities of XSL.

In terms both of power and simplicity, the combination of XML and XSL has revolutionized data storage and manipulation in a way not seen since the early days of the SQL database language. XML provides a clear and independent way of recoding data that is easily shared and understood. Similarly, many people feel that XSL is also easy to read, write, and understand. Clearly, this powerful duo are essential knowledge for everyone involved in the technology industry.

The broad scope and small learning curve associated with the basic elements of XSL transformation sometimes acts as a double-edged sword -- yielding broad usage of the core technology but dissuading the majority of developers learning XSL from investigating and using its more advanced and powerful features.

This article is written for developers who already have a basic understanding of XML and XSL, and are ready to build on this knowledge. If you are unfamiliar with these technologies, you can find several good introductory articles and tutorials on developerWorks and other Web sites. The article shows you how to use extensions -- a technique present in most XSL processors -- which allows virtually unlimited expansion of the existing capabilities of XSL's core features. This article includes a general description of how to write extensions with code, followed by three specific and widely applicable examples.

What are XSL extensions?

It must first be understood that XSL, like all other programing languages, is merely a grammar specification in need of an implementation. Fortunately, XSL has become very popular and there are several implementations to choose from. Extensions are not a required feature of the grammar and, thus, their syntax is not as well defined as the other constructs of the language. They are, however, now included in the W3C's XSLT Recommendation. The examples in this article will follow the format of that recommendation.

Simply put, extensions are a way of calling a method written in some other programming language from within an XSL document. Usually, the extension methods are written in the same language as that of the XSL processor. There are exceptions to this rule: Java, for example, can be made to run programs in other languages such as Javascript or Perl. Thus it is possible to write extensions in XSL in Javascript, Perl, or some other language and make use of them through a Java-based XSL processor.

What makes these extensions so significant when XSL can already do so much? What XSL gains in simplicity and broad ability for transformation is often lost in efficiency and ability to do anything unrelated to transformation. For instance, suppose you have an XML document that lists 5,000 users of your system. The user name, real name, and e-mail address of each of these users is given under a Users node within the XML. You later append to the XML document an Interests node in a separate subtree of the XML with user names grouped by particular interests such as acrobatics, bicycling, computers. You hope eventually to transform the data into an HTML page that groups users by interests and presents e-mail contacts for people of similar interests. XSL can do this handily with the following code:


Listing 1. User interest XSL transformation without extensions

<xsl:for-each select="Interests/Interest">
  <b><xsl:value-of select="@InterestName"/></b>
  <ul>
    <xsl:for-each select="User">
      <xsl:variable name="userName" select="@userName"/>
      <xsl:variable name="userNode" select="/Root/Users/User[@userName = 
		$userName]"/>
      <li>
		<xsl:value-of select="$userNode/@realName"/> 
		<xsl:value-of select="concat(' ',$userName/@email"/>
	 </li>
    </xsl:for-each>
  </ul>
</xsl:for-each>

Unfortunately, the way the transform executes, the entire list of 5,000 users will be examined for each user in each interest category. This is far more work than you want your server to do for each request to this Web page.

Extensions provide a convenient way around this and several other possible hang-ups that you may encounter when using XSL on nontrivial data sets. In the above example, a simple hashmap or binary search tree could have easily solved the problem, but implementing one of these data structures in XSL would be inconvenient and unnecessary. Extensions to a language that has more appropriate data types will more easily fix the problem. (Incidentally, the code for this fix is given in the first example below).



Back to top


Technologies used in this article

It would be a daunting task to list all of the XSL processors and their methods for implementing extensions. This article uses the Java version of Xalan -- a popular and freely available XSL processor from the Apache Project -- to describe the specifics of writing extensions. All of the examples are targeted to that platform. (Xerces, another Apache product, is used as the XML parser. You can download Xalan and Xerces from links in Resources.) Most other popular XSL implementations also provide a mechanism for extensions, but you'll need to consult their documentation to find any differences in approach.

To simplify working with XML and XSL, I have also provided Java code for some of the more common XML manipulations. This code, along with the code and data necessary to run all of the examples, is provided in a zip file in Resources. This file does not, however, include external libraries such as Xalan and Xerces. After you obtain those libraries by following links in Resources (versions: Xalan - Java 2.3.1; Xerces 1.4.4), place their jar files in the lib directory extracted from the zip file. For those readers who wish to jump directly to the examples, all Java code is in the src directory, XML data in the XML directory, XSL transforms in the XSL directory, batch files in the bin directory, and compiled code in the lib directory.



Back to top


Creating an extension

In order to call a method from XSL, that method must first be written and its compiled form placed in the classpath of the application that is performing the XSL transformation. Methods may be of your own design, supplied by the standard libraries of Java, or taken from other Java libraries. In some XSL processors, like Xalan, there are even extension methods written directly into the processor.

The first thing to be aware of when you write or use these methods is the mapping of data types from XSL to Java and back again. The following table provides a reference to these mappings in Xalan.

Tables 1,2. Data Type Mappings

Parameter Mapping
XSLT TypeJava Type
Node Setorg.w3c.dom.traversal.NodeIterator
Stringjava.lang.String
Booleanjava.lang.Boolean
Numberjava.lang.Double
Result Tree Fragmentorg.w3c.dom.DocumentFragment
Return Type Mapping
Java TypeXSLT Type
org.w3c.dom.traversal.NodeIterator org.apache.xml.dtm.DTM org.apache.xml.dtm.DTMAxisIterator org.apache.xml.dtm.DTMIterator org.w3c.dom.Node Node Set
java.lang.StringString
java.lang.BooleanBoolean
java.lang.NumberNumber
org.w3c.dom.DocumentFragmentResult Tree Fragment

Once your methods are written, incorporating them into XSL is fairly simple. The first step is to declare a namespace for your methods in the <xsl:stylesheet> element. For example, if you want to run methods from a class called foo in package com.myCompany.XSLExtensions, the root of your XSL file would contain the following line:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:extension="xalan://com.myCompany.XSLExtensions.foo"/>

If you later want to call a method from the class you have declared, use the namespace declared in the <xsl:stylesheet> element. Continuing the example, in order to run a method called bar() that takes a String as a parameter and returns a String, you might use code like the following:

<xsl:variable name="myParam" select="'theParameter'"/>
<xsl:variable name="myResult" select="extension:bar($myParam)"/>

It's that simple. The myResult variable now contains the result of calling bar from your Java class. To obtain a better grasp on the technique, work through the following three examples.



Back to top


Example 1: Lookup tables

The beginning of this article presented a scenario in which the use of standard XSL techniques for looking up data in distinct subtrees of an XML document used excessive amounts of compute time. A simple way around this is to create a general purpose hashtable that provides a mechanism for storing and retrieving strings. Since hashtables are built directly into the standard Java libraries, writing an extension that uses them should be painless.

The hashtable Java code is found in the src/StringHash.java file contained in the zip file in Resources. It has two methods of note:

  1. addString(String tableName, String key, String value)
  2. getString(String tableName, String key)

The first method allows the creation of hashtables associated with a table name plus the insertion of string values mapped to a key. The second method provides a means for retrieving the stored values.

An XML data source is found in the XML/user_interests.xml file (see the zip file in Resources). It follows the form:


Listing 2. User interest XML fragment

<Users>
  <User userName="aragon" realName="Aragon" 
	email="aragon@middleEarth.fict"/>
  <User userName="boromir" realName="Boromir" 
	email="boromir@middleEarth.fict"/>
  ...
</Users>
<Interests>
  <Interest name="archery">
    <User userName="legolas"/>
    ...
  </Interest>
  ...
</Interests>

Two XSL files are given in the zip file in Resources for producing the Web page result. The first is found in the XSL/user_interests_xsl_only.xsl file and follows the code shown in Listing 1. The second is found in the XSL/user_interests_extensions.xsl file which modifies the former XSL file to the code shown in Listing 3. To easily run the XSL conversion on Windows, use the bin/Example_1*.bat batch files. Unix and Mac developers should have little trouble running the examples after examining these batch files.


Listing 3. User interest XSL transformation with extensions

<xsl:stylesheet xmlns:lookup="xalan://StringHash">
...
<xsl:for-each select="Users/User">
  <xsl:value-of select="lookup:addString('realName', string(@userName), 
	string(@realName))"/>
  <xsl:value-of select="lookup:addString('email', string(@userName), 
	string(@email))"/>
</xsl:for-each>
...
<li>
  <xsl:value-of select="lookup:getString('realName',$userName)"/>
  <xsl:value-of select="concat(' - ',lookup:getString('email',
	$userName))"/>
</li>



Back to top


Example 2: Regular expressions

The current XSL standard uses the XPath technology to perform all of its pattern matching. While XPath provides a compact and elegant way of traversing an XML tree, its pattern matching functions have a rather limited capability. (The entirety of the string functions in XPath that performs boolean matching is: starts-with(), ends-with(), and contains(). You can also automatically parse strings into numbers.) Regular expressions provide much richer pattern matching across strings of text, but are as easy to use as XPath when traversing a data structure such as an XML tree. For more detailed information on regular expressions, see Resources.

The optimum solution is to combine the two technologies. The next version of the XSL transformation language, which is still under development and review, includes a proposal to add regular expressions to the language. For developers who want to use the technology now, extensions provide the mechanism for doing so.

The source code for the Java methods accessed as extensions can be found in the src/PatternMatcher.java file contained in the zip file accompanying this article. These methods make use of external code that is not contained within the standard Java libraries, thus this example also shows what steps are necessary to link external jar files for use in extensions. You will need to obtain he regular expression jar file provided by GNU (see Resources) and place it in the extracted lib directory, in order to get the examples to work. Feel free to find another regular expression package and modify the code to fit it.

For the second example, suppose you wish to generate a list of users from the original source, for which the first and last names of those users are known. While this is a fairly trivial example, it is not difficult to imagine more complicated examples working on groups of users, product catalogs, or reference databases. A simple way to do this is to look through the real names of the users and match those names which consist of one name followed by a space followed by another name. The regular expression for this is \w* \w . The XSL now contains the lines in Listing 4.


Listing 4. Regular expressions in XSL

<xsl:stylesheet xmlns:regexp="xalan://PatternMatcher">
...
<ul>
  <xsl:for-each select="Users/User[regexp:containsMatch('\w* \w*',
	string(@realName))]">
    <li>
      <xsl:value-of select="@realName"/>
    </li>
  </xsl:for-each>
</ul>

Similar to Example 1, this example can be executed through the bin/Example_2.bat file. You can find the XSL file used at XSL/user_last_names.xsl. The possibilites for extension on this technique are infinite.



Back to top


Example 3: Internationalization

Internationalization, sometimes referred to as localization or natural language support, is the method by which developers make their products readable across languages and cultures. It is particularly important in the context of XML translation if the product of the transformation is a set of Web pages that targets a broad audience. While topic of internationalization is too broad to introduce in a comprehensive way in the context of this example, you can find good treatment of it in other developerWorks articles referenced below.

This example makes use of Java's built-in technique of handling internationalization through the use of resource bundles. If you are unfamiliar with the topic, I encourage you to read the referenced articles. Suffice it to say for now that resource bundles consist of a collection of files that contain translations for different regions or, more precisely, locales. Web servers can read the preferred locale of a user when that user requests a Web page and, using these resource bundles, can respond appropriately. XML-based applications can also target results to a specific locale.

The potential uses of the code in this example are just as wide and varied as the previous one. In order to demonstrate the technology, the code executed by the bin/Example_3.bat file creates three Web pages from the sample XML users data. The three resulting pages represent the same view of the data, but are presented in three different languages. The translations used can be found in properties files in the lib directory extracted from the zip file.



Back to top


Conclusion

Even when considering the most basic components of XSL transformations, their capabilities are remarkable. When this core is extended with extensions to encompass the power of modern programming languages, the possibilities become virtually limitless. The ideas and examples presented above are but the tip of the iceberg, and I leave it to you, after gaining an undestanding of what is presented here, to explore the many remaining possibilites.




Back to top


Download

DescriptionNameSizeDownload method
Source code for this articlex-callbk/XSL_Callbacks_Code.zip1588 KBHTTP
Information about download methods


Resources



About the author

Jared Jackson is a Researcher at IBM's Almaden Research Center. He works in the area of Web-based technologies. You can contact Jared at jjared@almaden.ibm.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top