 | Level: Intermediate Chen Shu (chenshu@us.ibm.com), Software Engineer, IBM Nianjun Zhou (jzhou@us.ibm.com), Advisory Software Engineer, IBM Dikran Meliksetian (Dikran_Meliksetian@us.ibm.com), Senior Technical Staff Member, IBM
25 Mar 2003 This article describes a methodology for building an XML-based, end-to-end, multi-tiered solution by leveraging XSLT technology. The authors introduce this methodology through an example application in which XSLT is not only used in the transformation at the presentation layer, but also in retrieving data from heterogeneous data repositories and generating data-centric XML documents at the back-end. This application also provides data computation, such as statistical analysis in the middle tier.
As a widely accepted standard data format, XML successfully integrates multiple
application components seamlessly. Here's what the W3C has to say about XSL:
XSL is a language for expressing stylesheets. It consists of three parts:
XSL Transformations (XSLT): a language for
transforming XML documents,
the XML Path Language (XPath), an expression language used by XSLT to access or refer to
parts of an XML document. (XPath is also used by the XML Linking specification).
The third
part is XSL Formatting Objects (XSL-FO): an XML vocabulary for specifying formatting
semantics. An XSL stylesheet usually specifies the presentation of a class of
XML documents by describing how an instance of the class is transformed into an
XML document that uses the formatting vocabulary.
However, XSLT can be used
to perform additional tasks within an application that uses XML as its main data
representation model. (See Resources for more information on XSLT, XPath, and XSL-FO.)
In this article, we demonstrate how to build an application
where XSLT is used beyond its traditional role of format transformation. Within
the example application, we leverage XSLT to accomplish the following tasks:
- Transform the repository data from a relational representation to XML
- Perform statistical analysis of the data represented as an XML document
- Generate an XML document based on a particular business logic
- Render the XML as HTML, WML, and VXML
This application demonstrates that you can easily create search applications using legacy information and serve the results in multiple output formats by adding the proper XSL transformtion scripts. The methodology can be used in a variety of applications that need simple data analysis and data format transformation.
This article is organized as follows. In the next section, we briefly introduce the
example application and the requirements upon which we built it.
In the following section, we describe the architecture of our solution, followed
by a detailed description of the XML transformations required within the
application. Finally, we conclude by stressing the versatility of the solution
with respect to changes in requirements, as well as input and output data
formats.
The example
The example application is a search framework that attempts to minimize the
sequence of questions and answers required to find a searched object. The
system is called Guided Adaptive Search Framework (GASF). Figure 1
depicts the process flow of the GASF system.
Figure 1. Flow of GASF

To illustrate this process, let's consider a particular
implementation of GASF, such as an employee directory search system for a large
corporation with many branches in multiple cities. An end-user wants to find
specific information, such as the mailing address of a person named John Smith
who works in New York City. The
system prompts the user with a list of questions (the questionnaire) which might include
first name, last name, city, telephone number, and whether this person is a
manager or not. The user selects a
question to answer (for example, first name) and gives the answer (let's say
"John"). The system performs a repository search and finds a large number of
results. The system creates a second questionnaire by removing the already
answered question and reorganizing the order of the remaining questions based
on the statistical distribution of the possible values for these questions. The
user is then presented with this second questionnaire. This sequence is repeated
until either the name is found or it is determined that such a person does
not exist. One of the objectives of the process and the statistical analysis is
to minimize the number of question-answer cycles.
The requirements
The requirements for the GASF can be summarized as follows:
-
The data repository can be implemented with either a relational database or an LDAP
directory with arbitrary schemas.
Although data retrieved from RDBMS or LDAP repositories have vastly different formats, the GASF presents the data to the application in a uniform format irrespective of the origin.
- The system filters out the questions that are irrelevant or unnecessary for
the search in progress. For example, if in the current state of the search it
is determined that a certain attribute for all the prospective results has the
same value, that attribute is irrelevant and no question based on that
attribute should be generated.
- The system minimizes the number of questionnaire preparation-answer
processing cycles.
- The type of device that the request is coming from dictates how the output is rendered. For example, if a phone is used to request the search, the questionnaire is presented using VoiceXML. If a PDA is used, then the questionnaire is rendered as WML to accommodate the small screen of a PDA.
-
Finally, we are targeting the development of a framework to be used in building similar applications with
potentially different input/output requirements. We want to minimize the coding effort by externalizing
the application-specific requirements. We accomplish this by making the framework initialize itself with
application-specific configuration files. For example, the data repository location, the list of attributes, and the output formats are all specified in the configuration files.
 |
Architecture of the solution
To satisfy the requirements for GASF, we have come up with the following
XML-based end-to-end solution (which is illustrated in Figure 2):
- Use XML Integrator (XI Engine -- see Resources) to interface to the data repository.
A script does the specification of the search and the conversion of the retrieved data into XML.
- Use an Analysis Engine to
calculate the distribution of the values of the retrieved attributes, and
to determine what questions to ask and in which order. For example, if a
particular attribute of all the retrieved instances has the same value,
that attribute is dropped from the next set of questions. The attribute
that has the widest distribution of values is considered the best
candidate to be the first question asked in the next cycle. The
questionnaire is created based on the XML representing this statistical
information that is generated by the Analysis Engine.
- Use a Transformation Engine
to transform the questionnaire and present it to the user based on type of device being used.
Figure 2. Architecture of GASF

The purpose of this methodology is to be a middleware
solution that is independent from the underlying data repository schema and the
presentation layer. The advantages of this solution are the separation of the
transformation specific code from the application logic, and the ease with
which any changes in either the database schema or the XML structure can be
handled without application code changes.
Details of the solution
In this section, we describe the various XSL style sheets and the XML format used
in a simplified example of the employee directory search application. Let's
assume that the employee data are stored in a relational database with the
following schema, which uses IBM DB2 as an example:
Listing 1. Schema of employee database
CONNECT TO EMPLOYEE;
-- DDL Statements for table "EMPLOYEE"."EMPLOYEE"
CREATE TABLE "EMPLOYEE"."EMPLOYEE" (
"ID" CHAR(6) NOT NULL ,
"LASTNAME" VARCHAR(100) ,
"CITY" VARCHAR(100) NOT NULL ,
"FIRSTNAME" VARCHAR(100) NOT NULL ,
"PHONE" VARCHAR(100) NOT NULL ,
"ADDRESS" VARCHAR(100) NOT NULL ,
"ISMANAGER" CHAR(1) NOT NULL )
IN "USERSPACE1"
COMMIT WORK;
CONNECT RESET;
TERMINATE;
|
XI Engine
The XML Integrator (XI) Engine is a tool for converting data between XML and structured data formats
such as relational or LDAP data. The XI Engine is based on a script representing the
relationship between the two information structures. It is available on IBM alphaWorks. More details on XI are available in Resources. Currently,
XI supports two notations, DTDSA and XRT, to specify this relationship.
In our example, we use the XRT notation, which is a loose extension
of XSL that ties together query statements with an XSL transformation.
Listing 2 shows the XRT script that is used to retrieve
data from the Employee
database and create the intermediate data XML shown in Listing 3.
The XRT script contains two parts. The first
part defines how the data is retrieved from the data repository, which
includes the location of the data source, the SQL queries, and the relationship
of the queries. The second part defines an XML transformation template. The
execution of the XRT script first creates an internal XML representation that abides by
a standard XRT schema; this XML representation is then
transformed using the template that's defined by the second part of the XRT script.
For our application, the second part of template is the same for all the searches, but
the queries themselves are dynamically changed by the addition of more search
constraints based on the answers from the user.
Listing 2. XRT script for XI data retrieval
<?xml version="1.0" encoding="UTF-8"?>
<xrt:xrt xmlns:xrt="http://www.xrt.org"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xrt:rdbms2xml>
<xrt:locator xrt:name="d" xrt:url="jdbc:db2:employee"
xrt:driver="com.ibm.jdbc.app.DB2Driver" xrt:userid="foo"
xrt:password="bar"/>
<xrt:sqlsearch xrt:qid="q1">
<xrt:query>select lastname,firstname,city,address,phone,
ismanager from employee
where firstname = 'John' and lastname = 'Smith'</xrt:query>
</xrt:sqlsearch>
</xrt:rdbms2xml>
<xrt:xml2xml>
<xsl:stylesheet version="1.0" xmlns:xrt="http://www.xrt.org"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="store2xml">
<xsl:element name="entries">
<xsl:apply-templates select="q1" />
</xsl:element>
</xsl:template>
<xsl:template match="q1">
<xsl:element name="entry">
<xsl:element name="attr">
<xsl:attribute name="name">lastname</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="LASTNAME/@value" />
</xsl:attribute>
</xsl:element>
<xsl:element name="attr">
<xsl:attribute name="name">firstname</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="FIRSTNAME/@value" />
</xsl:attribute>
</xsl:element>
<xsl:element name="attr">
<xsl:attribute name="name">city</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="CITY/@value" />
</xsl:attribute>
</xsl:element>
<xsl:element name="attr">
<xsl:attribute name="name">address</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="ADDRESS/@value" />
</xsl:attribute>
</xsl:element>
<xsl:element name="attr">
<xsl:attribute name="name">phone</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="PHONE/@value" />
</xsl:attribute>
</xsl:element>
<xsl:element name="attr">
<xsl:attribute name="name">ismanager</xsl:attribute>
<xsl:attribute name="value">
<xsl:value-of select="ISMANAGER/@value" />
</xsl:attribute>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
</xrt:xml2xml>
</xrt:xrt>
|
Listing 3. Intermediate data XML generated from XI
<?xml version="1.0" encoding="UTF-8"?>
<entries>
<entry>
<attr name="lastname" value="Smith" />
<attr name="firstname" value="John" />
<attr name="city" value="New York" />
<attr name="address" value="18 Broadway, New York, NY12000" />
<attr name="phone" value="123-456-9012" />
<attr name="ismanager" value="N" />
</entry>
<entry>
<attr name="lastname" value="Smith" />
<attr name="firstname" value="John" />
<attr name="city" value="Miami" />
<attr name="address" value="123 Flagler St., Palm Beach, FL23000" />
<attr name="phone" value="234-567-9012" />
<attr name="ismanager" value="N" />
</entry>
<!--48 more entries down here -->
</entries>
|
Analysis Engine
Having retrieved the data as an XML document that contains 50 entries (see Listing 3), the Analysis Engine performs a statistical
analysis on the XML data by applying the XSL shown in Listing 4.
This style sheet can be used in all the search applications as long as the XML data
complies with the common format as in Listing 3.
Listing 4. XSL for statistical analysis
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" />
<xsl:key name="attr-by-name" match="attr" use="@name" />
<xsl:key name="attr-by-name-value" match="attr"
use="concat(@name, '+', @value)" />
<xsl:template match="entries">
<attributelist instances="{count(entry)}">
<xsl:for-each select="entry/attr[generate-id() =
generate-id(key('attr-by-name', @name)[1])]">
<attribute name="{@name}">
<xsl:for-each select="key('attr-by-name', @name)
[generate-id() = generate-id(key('attr-by-name-value',
concat(@name, '+', @value))[1])]">
<instance value="{@value}"
occurrence="{count(key('attr-by-name-value',
concat(@name, '+', @value)))}" />
</xsl:for-each>
</attribute>
</xsl:for-each>
</attributelist>
</xsl:template>
</xsl:stylesheet>
|
The outcome of the statistical analysis is the XML document shown in Listing 5:
- The total number of instances is 50
- Each instance has a different phone number
- All instances have the same
lastname ("Smith") and the same firstname ("John")
- None of them is a manager
- The number of different city locations and addresses is five
Listing 5. XML document with statistical analysis results
<?xml version="1.0" encoding="UTF-8"?>
<attributelist instances="50">
<attribute name="lastname">
<instance occurrence="50" value="Smith"/>
</attribute>
<attribute name="firstname">
<instance occurrence="50" value="John"/>
</attribute>
<attribute name="city">
<instance occurrence="8" value="WASHINGTON"/>
<instance occurrence="10" value="DALLAS"/>
<instance occurrence="15" value="AUSTIN"/>
<instance occurrence="1" value="New York"/>
<instance occurrence="16" value="Miami"/>
</attribute>
<attribute name="phone">
<instance occurrence="1" value="1234567"/>
<instance occurrence="1" value="2345671"/>
<!--48 more instances -->
</attribute>
<attribute name="address">
<instance occurrence="8" value="20 Burr Road "/>
<instance occurrence="10" value="1024 24ST"/>
<instance occurrence="15" value="3901 110Ave"/>
<instance occurrence="1" value="18 Broadway "/>
<instance occurrence="16" value="123 Flagler St."/>
</attribute>
<attribute name="ismanager">
<instance occurrence="50" value="N"/>
</attribute>
</attributelist>
|
From the instance value distribution of attributes shown in
Listing 5, GASF
determines that it has to create another questionnaire to further drill down
for the searched object. The Analysis Engine generates the next questionnaire
by reorganizing the Initial Attribute List XML document shown in
Listing 6,
which defines the initial mapping between attributes and corresponding
questions for different media types.
Listing 6. Initial Attribute List XML document
<?xml version="1.0" encoding="UTF-8"?>
<gasf>
<attribute name="lastname">
<html><question>Enter the last name<question></html>
</attribute>
<attribute name="firstname">
<html><question>Enter the first name<question></html>
</attribute>
<attribute name="city">
<html><question>Enter the city<question></html>
</attribute>
<attribute name="address">
<html><question>Enter the mailing address<question></html>
</attribute>
<attribute name="phone">
<html><question>Enter the phone number<question></html>
</attribute>
<attribute name="ismanager">
<html><question>Is this person a manager?<question></html>
</attribute>
</gasf>
|
During this reorganizing process, attributes that have already been specified by the
user, such as lastname and firstname, are eliminated from the questionnaire; attributes that have a constant value for
all instances, such as ismanager, are dropped; the remaining
attributes are ordered so that the attribute with the widest distribution becomes the first question
in the questionnaire list. Listing 7 shows the generated questionnaire in XML format.
Listing 7. Questionnaire XML document
<?xml version="1.0" encoding="UTF-8"?>
<gasf>
<attribute name="phone">
<html><question>Enter the phone number<question></html>
</attribute>
<attribute name="address">
<html><question>Enter the mailing address<question></html>
</attribute>
<attribute name="city">
<html><question>Enter the city<question></html>
</attribute>
</gasf>
|
Transformation Engine
Finally, the questionnaire XML document is rendered to the user according to the device
being used, by applying the appropriate XSL transformation style sheet. For
example, if the request originates from a PC Web browser, then the XSL style sheet shown in
Listing 8 is used to render an HTML Web
page.
Listing 8. Questionnaire XSL for transforming XML to HTML
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml"/>
<xsl:param name="url"/>
<xsl:template match="gasf">
<html>
<form method="post" action="{$url}search">
<p>Choose one of the following questions: </p>
<xsl:apply-templates select="attribute"/>
<p><input type="submit" name="submit"/></p>
</form>
</html>
</xsl:template>
<xsl:template match="attribute">
<p>
<input type="radio" name="attribute" VALUE="{@name}"/>
<xsl:value-of select="html/question"/>
</p>
</xsl:template>
</xsl:stylesheet>
|
Extensibility
It's easy to add more rendering formats with this system -- you just need to do two things:
- Add the media type and the
corresponding transformation in the Initial Attribute List XML. For
instance, if you want to add a WML format for PDAs, you need to add
wml
as a subelement of each attribute element in
Listing 6, and put
the transformation requirement in it. In the same fashion, if the user is
using a phone to interact with the system, you may want to add a vxml subelement. Listing 9 shows an example of such an extended attribute element.
- Add a transformation XSL style sheet similar
to that in Listing 8 to transform the questionnaire XML in
Listing 7 for the new media
type. Listing 10 and
Listing 11, respectively, show sample transformation XSL style sheets for XML to WML and XML to VXML.
Listing 9. Example of Extended Initial Attribute
<attribute name="phone">
<html><question>Enter the phone number</question></html>
<wml><question>Tap phone number</question></wml>
<vxml option="1">
<question>press or say the phone number</question>
<type>digits</type>
<grammar src="builtin:grammar/digits?minlength=1;maxlength=7" mode="dtmf"></grammar>
<catch event="noinput nomatch">
<reprompt/>
</catch>
</vxml>
</attribute>
|
Listing 10. Sample questionnaire XSL for transforming XML to WML
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml"/>
<xsl:param name="url"/>
<xsl:template match="gasf">
<xsl:text disable-output-escaping="yes">
<![CDATA[<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN"
"http://www.wapforum.org/DTD/wml_1.1.xml">]]>
</xsl:text>
<wml>
<card id="questionary" title="Questionary">
<do type="accept">
<go method="post" href="{$url}">
<postfield name="attribute" value="$(attribute)" />
</go>
</do>
<p>Choose one of the following questions: </p>
<p>
<select name="attribute">
<xsl:apply-templates select="attribute"/>
</select>
</p>
</card>
</wml>
</xsl:template>
<xsl:template match="attribute">
<option value="{@name}">
<xsl:value-of select="wml/question"/>
</option>
</xsl:template>
</xsl:stylesheet>
|
Listing 11. Sample questionnaire XSL for transforming XML to VoiceXML for phone
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" />
<xsl:param name="url" />
<xsl:template match="gasf">
<vxml version="1.0">
<meta name="Content-Type" content="text/x-vxml" />
<property name="caching" value="safe" />
<form id="main" scope="dialog">
<block>
<prompt>Choose one of the following questions:
<break msecs="10" />
</prompt>
<goto next="#check_searchable_attribute" />
</block>
</form>
<menu id="check_searchable_attribute" scope="document" dtmf="false">
<prompt>
<enumerate>
<break msecs="10" />
Press<value expr="_dtmf" mode="tts" />or say
<value expr="_prompt" mode="tts" />
</enumerate>
</prompt>
<xsl:apply-templates select="attribute" />
<choice dtmf="9" next="#check_searchable_attribute">
nine for Repeating the menu</choice>
<choice dtmf="0" next="#quit">
zero to Exit the Search System
<grammar type="application/x-jsgf">quit | exit | goodbye
</grammar>
</choice>
<noinput>
Please select at least one option
<reprompt />
</noinput>
<nomatch>
Sorry, that is not an option. Try again
<reprompt />
</nomatch>
<catch event="error.badfetch">
<prompt>Some where something went wrong, lets try again
</prompt>
<goto next="#check_searchable_attribute" />
</catch>
</menu>
<form id="quit" scope="document">
<block>
<prompt>Thank You for using the Voice Search System, Goodbye
</prompt>
</block>
<block>
<exit />
</block>
</form>
</vxml>
</xsl:template>
<xsl:template match="attribute">
<choice dtmf="{vxml/@option}" next="{$url}?attribute={@name}">
<xsl:value-of select="@option" />for
<xsl:value-of select="vxml/question" />
</choice>
</xsl:template>
</xsl:stylesheet>
|
It is also very easy to use GASF to build other search systems using various
existing data sources. Again, two steps are required:
- Create the script that contains a data source definition similar to that in
Listing 2.
- Create the Initial Attribute List XML similar to that in
Listing 6, which contains all the attributes to search
on.
Conclusion
The methodology demonstrated by GASF can be used in a variety of applications -- such as
Web content management systems, knowledge management systems, and
business-to-business transactions -- where you have the need to compose an XML object from various data sources, and then process and render it on the fly.
We believe that as XSLT technology matures, this can be performed more efficiently
and extensively.
The primary advantage of leveraging XSLT to enable applications is its flexibility and
low cost of development. For applications that do not need to support high volume transactions, XSL
transformation can provide a quick, easy, and cost-saving solution.
Resources - IBM alphaWorks features a wide range of "alpha code" technologies, available for download at the earliest stages of development. Try one such tool, XML Integrator (XI Engine), referenced in this article.
- For more information about XSL, visit the W3C's XSL page.
- For more information about XSLT, visit the W3C's XSLT page.
- For more information about XPath, visit the W3C's XPath page.
- For more information about XLink, visit the W3C's XLink page.
- Get a handle on XSL-FO with two developerWorks tutorials by Doug Tidwell --
XSL-FO basics and
XSL-FO advanced techniques. Also, Doug's HTML-to-FO conversion guide features a wide range of XSLT templates to speed your conversions of HTML elements to FO and thence to PDF (February 2003).
- For more information about WML, visit the W3Schools WML page.
- For more information about VoiceXML, visit the ZVON Voice XML page.
- Find more XML resources on the developerWorks XML zone.
- IBM WebSphere Studio provides a suite of tools that automate XML development, both in Java and in other languages. It is closely integrated with the WebSphere Application Server, but can also be used with other J2EE servers.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
About the authors  | |  |
Chen Shu is a software engineer with IBM Internet Technology group, where she plays an active role
in e-business application prototyping using emerging Internet technologies. Her areas of interest include
XML, Web services, and pervasive computing. You can contact Chen at
chenshu@us.ibm.com. |
 | |  | Nianjun Zhou is an advisory software engineer with IBM Internet Technology group. He has worked on several projects related to Grid computing, XML-based content management, XML, and relational database/LDAP transformation. His interests include using computer technologies to develop new applications that can enhance the efficiency of knowledge sharing and information management in general. You can contact him at jzhou@us.ibm.com. |
 | |  | Dikran S Meliksetian is a senior technical staff member with IBM Internet Technology group. He has previously been involved in the development of Content Management solutions, and is currently involved in a number of Grid Computing projects. You can contact him at Dikran_Meliksetian@us.ibm.com. |
Rate this page
|  |