 | Level: Introductory Brett McLaughlin (brett@newInstance.com), Enhydra Strategist, Lutris Technologies
01 Dec 2000 In this follow-up article on JAXP, Sun's Java API for XML Parsing, the author analyzes the newest version, 1.1, which includes updated support for the SAX and DOM standards. With the addition of TRaX, JAXP 1.1 provides Java and XML developers an indispensable tool in writing vendor-neutral code for parsing and transforming XML documents.
If you are a frequent reader of developerWorks' XML zone, you may be
a little puzzled by the presence of another JAXP article. Just about a
month ago, I wrote a piece entitled "All About JAXP." In that article, I gave a complete explanation of JAXP, the Java
API for XML Parsing, how it works, and how it could help you out in dealing
with XML data in Java programs. That article covered the 1.0 release of
JAXP. Familiar territory
So why the heck am I writing about JAXP again? I'm a member of the
expert group for JAXP 1.1, and we're nearing completion on the 1.1 specification.
While most "point releases" (in which a version moves from 1.0 to 1.1,
or 2.2 to 2.3) result in minor, or at least simple, changes to existing APIs,
the 1.1 release of JAXP is significantly different than its predecessor.
In fact, I'll spend only about one third of this article covering new methods
on existing classes and functionalities; the rest of the article will focus
on completely new classes and features of the 1.1 version of JAXP. In other
words, there's just so much new (and good) in JAXP 1.1 that I couldn't
wait to give you a taste of what's coming. If you are new to JAXP, if you're using it now, or if you've been holding
off on using it until it matures a bit more, this article is right for
you. I'll cover the modifications to the 1.0 version of the API and spend
a good bit of time talking about TRaX (Transformations for XML). TRaX is the API
that has been incorporated into JAXP to allow a vendor-neutral means of
making XSL transformations; this complements the existing ability of JAXP
to allow for vendor-independence in XML parsing. I suggest you read my
first JAXP article, take a quick coffee break, and dive into this discussion
of JAXP 1.1.
Enhancing the parsing API
Many of the changes to the JAXP API have centered around parsing, which
makes sense, given that the "P" in JAXP stands for "parsing." But the most
significant changes in JAXP 1.1 center around XML transformations, which
I will cover later in this article. In terms of the existing JAXP functionality,
the changes are fairly minor. The biggest addition is support for SAX 2.0,
which went final in May of 2000, and DOM Level 2, which is still being
finalized. The previous version of JAXP only supported SAX 1.0 and DOM
Level 1. This lack of updated standards has been one of the biggest
criticisms of JAXP 1.0. In addition to updating JAXP to the newest versions of SAX and DOM,
several small changes have been made in the API (as discussed in my last
article). Almost all of these changes are important ones that are the result
of feedback from the various companies and individuals on the JAXP expert
group. All of these changes also deal with configuring the parsers returned from
JAXP's two factories, SAXParserFactory and DocumentBuilderFactory.
I'll cover these, as well as the update in standards support for SAX and
DOM, now.
Updating the standards
The most anticipated change from JAXP 1.0 to 1.1 is the updated support
for the popular SAX and DOM standards. SAX, the Simple API for XML, had
a version 2.0 release in May of 2000 that provided greatly
enhanced support for XML namespaces, among other items. This namespace
support enables the use of numerous other XML vocabularies,
such as XML Schema, XLink, and XPointer. While it was possible to use these
vocabularies in SAX 1.0, the burden was on the developer to split an element's
local (or qualified) name from its namespace, and keep track of namespaces
throughout the document. SAX 2.0 provides this information to the developer,
dramatically simplifying the process of carrying out these programming
tasks. The same goes for DOM Level 2: namespace support, as well as a wealth of other methods on the DOM classes, is available. While DOM Level 2 has not been finalized, JAXP 1.1 supports the specification
as it now stands. As minor changes get introduced in the final stages of
the DOM standard, JAXP will, of course, include these modifications. The good news is that these changes are generally transparent to the
developer using JAXP. In other words, these standards updates happen somewhat
"automatically," without user intervention. Simply specifying a SAX 2.0-compliant
parser to the SAXParserFactory and a DOM Level 2-compliant parser
to the DocumentBuilderFactory class takes care of the update.
The road to SAX 2.0
There are a few significant changes related to these standards updates. In SAX 1.0, the parser interface that was implemented by vendors and XML parser projects was org.xml.sax.Parser. The
JAXP class SAXParser, then, provided a method to get this underlying
implementation class through the getParser() method. The signature
for that method looks like:
public interface SAXParser {
public org.xml.sax.Parser getParser();
// Other methods
} |
However, in the change from SAX 1.0 to 2.0, the Parser interface
was deprecated and replaced with a new interface, org.xml.sax.XMLReader.
This made the getParser() method essentially useless for obtaining
an instance of the SAX 2.0 XMLReader class. To support this, and to support SAX 2.0,
a new method has been added to the JAXP SAXParser class. Not surprisingly, this method is named getXMLReader() and
looks like:
public interface SAXParser {
public org.xml.sax.XMLReader getXMLReader();
public org.xml.sax.Parser getParser();
// Other methods
} |
In this same way, the class that was used in SAX 1.0 to implement callbacks
was org.xml.sax.HandlerBase, and an instance of that class was
supplied to all of the JAXP 1.0 parse() methods. But due to some
additional SAX 2.0 deprecations and changes, this class is no longer used
in SAX 2.0. Instead, it has been replaced by a new class, org.xml.sax.ext.DefaultHandler.
To accommodate this change, all of the parse() methods on the
SAXParser class have been complemented with versions of the same
method that take an instance of the DefaultHandler class to support
SAX 2.0. To help you see this difference, the methods I'm talking about
are shown in Listing 3:
public interface SAXParser {
// The SAX 1.0 parse methods
public void parse(File file, HandlerBase handlerBase);
public void parse(InputSource inputSource, HandlerBase handlerBase);
public void parse(InputStream inputStream, HandlerBase handlerBase);
public void parse(InputStream inputStream, HandlerBase handlerBase,
String systemID);
public void parse(String uri, HandlerBase handlerBase);
// The SAX 2.0 parse methods
public void parse(File file, DefaultHandler defaultHandler);
public void parse(InputSource inputSource, DefaultHandler defaultHandler);
public void parse(InputStream inputStream, DefaultHandler defaultHandler);
public void parse(InputStream inputStream, DefaultHandler defaultHandler,
String systemID);
public void parse(String uri, DefaultHandler defaultHandler);
// Other methods
} |
Having all these methods for parsing may seem a bit confusing, but it's
only tricky if you're working with both versions of SAX. If you
are using SAX 1.0, you'll be working with the Parser interface
and HandlerBase class, and it will be obvious which methods to
use. Similarly, when using SAX 2.0, it will be obvious that the methods
that accept DefaultHandler instances and return XMLReaders
will be used. So take all this as a reference and don't worry too much
about it! There are some other changes to the SAX portion of the API, as
well.
Changes in existing SAX classes
To complete the discussion of the changes to existing JAXP functionality,
I need to go over a few new methods that are available to JAXP SAX users.
First, the SAXParserFactory class has a new method, setFeature().
As you may recall from JAXP 1.0, the SAXParserFactory class allows
configuration of SAXParser instances returned from the factory.
In addition to the methods already available, (setValidating()
and setNamespaceAware()), this new method allows SAX 2.0 features
to be requested for new parser instances. SAX 2.0 provides features
that allow vendors to create specific functionality for their parsers;
users can then interact with these features through SAX. For example, a
user may request the http://apache.org/xml/features/validation/schema
feature, which allows XML Schema validation to be turned on or off. This
can now be performed directly on a SAXParserFactory, which is
shown in Listing 4: SAXParserFactory myFactory = SAXParserFactory.newInstance();
// Turn on XML Schema validation
myFactory.setFeature("http://apache.org/xml/features/validation/schema", true);
// Now get an instance of the parser with schema validation enabled
SAXParser parser = myFactory.newSAXParser(); |
Of course, a getFeature() method is provided to complement the setFeature() method and allow querying of particular features. This method
returns a simple boolean value. In addition to SAX allowing features to be set (with true or
false values), properties also can be set. In SAX, properties
are names associated with actual Java objects. For example, using an instance
of a SAX parser, you could set the property http://xml.org/sax/properties/lexical-handler,
assigning that property an implementation of a SAX LexicalHandler
interface. That implementation would then be used by the parser for lexical
processing. Because properties like this lexical one are parser-specific
instead of factory-specific (as features were), a setProperty()
method is provided on the JAXP SAXParser class, rather than on
the SAXParserFactory class. And as with features, a getProperty()
complement is provided to return the value associated with a specific property,
also on the SAXParser class.
Updates in DOM
A number of new methods are available for the DOM portion of JAXP.
These methods have been added to existing JAXP classes to support both DOM Level 2 options, as well as common configuration situations that have arisen in the last year. I won't cover all of these options and the corresponding methods here since many are fairly obtuse (they are used only in very unusual
situations) and won't be needed in many of your applications. You are certainly
encouraged to check these out in the latest JAXP specification online (see
the Resources section). With the coverage of standards updates, SAX changes, and additional DOM methods, you're ready to read about the most substantial changes in
JAXP 1.1 -- the TRaX API.
The TRaX API
So far, I've covered the changes to XML parsing in JAXP. Now I can
turn to XML transformations in JAXP 1.1. Perhaps the most exciting development
in the newest version of Sun's API is that JAXP 1.1 will allow vendor-neutral
XML document transformations. If you're unfamiliar with XML transformations
and XSLT (XML transformations), check out dW tutorials (see Resources).
While this vendor neutrality may expand on the current vision of JAXP as
simply a parsing API, it is a much needed facility since XSL processors currently
employ different methods and means for enabling user and developer interaction.
In fact, XSL processors have even greater variance across providers than
their XML parser counterparts. Originally, the JAXP expert group sought to provide a simple Transform
class with a few methods to allow specification of a style sheet and subsequent
document transformations. This first effort turned out to be rather shaky,
but I'm happy to report that we (the JAXP expert group) are going much
further in our continued efforts. Scott Boag and Michael Kay, two of the
XSL processor gurus today (working on Apache Xalan and SAXON, respectively), have worked with others to develop TRaX. This supports a much wider array of options and features, and provides complete support for almost all XML transformations -- all under the JAXP umbrella. Like the parsing portion of JAXP, performing XML transformations requires
three basic steps:
- Obtain a
Transformer factory
- Retrieve a
Transformer
- Perform operations (transformations)
Working with the factory
For the transformation portion of JAXP, the factory you will work with
is called javax.xml.transform.TransformerFactory. This class is
analogous to the SAXParserFactory and DocumentBuilderFactory
classes that I already covered in both my first JAXP article and earlier
in this article. Of course, simply obtaining a factory instance to work with
is a piece of cake:
TransformerFactory factory = TransformerFactory.newInstance(); |
Once the factory is available, various options can be
set upon the factory. Those options will affect all instances of Transformer
(which I'll cover in a minute) created by that factory. (By
the way, you can also obtain instances of javax.xml.transform.Templates
through the TransformerFactory. Templates are an advanced JAXP
concept, and one I don't have space to cover here.) The first of the options you can work with are attributes. These
are not XML attributes, but are similar to the properties I discussed in
reference to XML parsers. Attributes allow options to be passed to the
underlying XSL processor, which may be Apache Xalan, SAXON, or Oracle's
XSL processor. They are largely vendor-dependent. Like the parsing side
of JAXP, a setAttribute() method is provided as well as a counterpart,
getAttribute(). Like setProperty(), the former takes
an attribute name and Object value. And like getProperty(),
the latter takes an attribute name and returns the associated Object
value. Setting an ErrorListener is the second option available. Defined
in the javax.xml.transform.ErrorListener interface, an ErrorListener
allows problems in transformation to be caught and handled programmatically.
If you're familiar with SAX, this interface looks remarkably similar to
the org.xml.sax.ErrorHandler interface:
package javax.xml.transform;
public interface ErrorListener {
public void warning(TransformerException exception)
throws TransformerException;
public void error(TransformerException exception)
throws TransformerException;
public void fatalError(TransformerException exception)
throws TransformerException;
} |
Creating an implementation of this interface, filling the three callback
methods, and using the setErrorListener() method on the TransformerFactory
instance you are working with sets you up to deal with any errors. Finally, a method is provided to set and retrieve the URI (a uniform
resource indicator, often a URL) resolver for the instances generated
by the factory. The interface defined in javax.xml.transform.URIResolver
also behaves similarly to a SAX counterpart, org.xml.sax.EntityResolver.
The interface has a single method:
package javax.xml.transform;
public interface URIResolver {
public Source resolve(String href, String base)
throws TransformerException;
} |
This interface, when implemented, allows URIs found in XML constructs
like xsl:import and xsl:include to be handled. Returning
a Source (which I'll cover in a moment), you can instruct your
transformer to search for the specified document in various locations when
a particular URI is encountered. For example, when an include of the URI
http://www.oreilly.com/oreilly.xsl is encountered, you might instead
return the local document oreilly.xsl and prevent the need for
network access. Implementations of the URIResolver interface can
be set using the TransformerFactory's setURIResolver()
method, and retrieved using the getURIResolver() method.
Finally, once you have set the options of your choice, you can obtain
an instance, or instances, of a Transformer through the newTransformer
method of the factory:
// Get the factory
TransformerFactory factory = TransformerFactory.newInstance();
// Configure the factory
factory.setErrorResolver(myErrorResolver);
factory.setURIResolver(myURIResolver);
// Get a Transformer to work with, with the options specified
Transformer transformer = factory.newTransformer(new StreamSource("sheet.xsl"));
|
As you see, this method takes the style sheet as input to use in all
transformations for that Transformer instance. In other words,
if you wanted to transform a document using style sheet A and style sheet
B, you would need two Transformer instances, one for each style sheet.
If you wanted to transform multiple documents with the same style sheet
(call it style sheet C), however, you would only need a single Transformer
instance, associated with style sheet C.
Transforming XML
Once you have an instance of a Transformer, you can go about
actually performing XML transformations. This consists of two basic steps:
- Set the XSL style sheet to use
- Perform the transformation, specifying the XML document and result target
As I discussed above, the first step is really the easiest. A style sheet
must be supplied when obtaining a Transformer instance from the
factory. The location of this style sheet must be specified by providing
a javax.xml.transform.Source for its location. The Source
interface, which you've seen in a few code samples so far, is the means
of locating an input -- be it a style sheet, document, or other information
set. TRaX not only provides the Source interface, but also three
concrete implementations:
-
javax.xml.transform.stream.StreamSource |
-
javax.xml.transform.dom.DOMSource |
-
javax.xml.transform.sax.SAXSource |
The first of these, StreamSource, reads input from some type
of I/O device. Constructors are provided for accepting an InputStream,
a Reader, or a String system ID as input. Once created, the StreamSource can be passed in to the Transformer for use. This will probably
be your most common Source implementation used. It's great for
reading a document from a network, input stream, user input, or other somewhat
static representation. The next Source, DOMSource, provides for reading from
an existing DOM tree. It supplies a constructor for taking in a DOM org.w3c.dom.Node,
and will read from that Node when used. This is ideal for supplying
an existing DOM tree to a transformation, perhaps if parsing has already
occurred and an XML document is already in memory as a DOM structure.
SAXSource provides for reading input from SAX producers. This
Source implementation takes either a SAX org.xml.sax.InputSource,
or an org.xml.sax.XMLReader as input, and uses the events
from these sources as input. This is ideal for situations in which a SAX is already
in use, and callbacks are set up and need to be triggered prior to transformations. Once you've obtained an instance of a Transformer (by providing
the style sheet to use through an appropriate Source), you're ready
to perform a transformation. To accomplish this, the transform()
method is used (no surprise there) as follows:
// Get the factory
TransformerFactory factory = TransformerFactory.newInstance();
// Configure the factory
factory.setErrorResolver(myErrorResolver);
factory.setURIResolver(myURIResolver);
// Get a Transformer to work with, with the options specified
Transformer transformer = factory.newTransformer(new StreamSource("sheet.xsl"));
// Perform transformation on document A, and print out result
transfomer.transform(new StreamSource("documentA.xml"),
new StreamResult(System.out));
|
The transform() method takes two arguments: a Source
implementation, and a javax.xml.transform.Result implementation.
You should already be seeing the symmetry in how this works and have an
idea about the functionality of the Result interface. The Source
should provide the XML document to be transformed, and the Result
should provide an output target for the transformation. Like Source,
there are three concrete implementations provided with TRaX and JAXP of
the Result interface:
-
javax.xml.transform.stream.StreamResult |
-
javax.xml.transform.dom.DOMResult |
-
javax.xml.transform.sax.SAXResult |
The StreamResult takes as a construction mechanism either an
OutputStream (like System.out in the example above),
or a Writer. DOMResult takes a DOM Node to output
the transformation to (presumably as a DOM org.w3c.dom.Document),
and SAXResult takes a SAX ContentHandler to fire callbacks
to, resulting from the transformed XML. All are analogous to their Source
counterparts, and you can easily figure out their uses from those counterparts. While the example above shows transforming from a stream to a stream,
any combination of sources and results is possible. Here are a few examples:
// Perform transformation on document A, and print out result
transformer.transform(new StreamSource("documentA.xml"),
new StreamResult(System.out));
// Transform from SAX and output results to a DOM Node
transformer.transform(new SAXSource
(new InputSource("http://www.oreilly.com/catalog.xml")),
new DOMResult(DocumentBuilder.newDocument()));
// Transform from DOM and output to a File
transformer.transform(new DOMSource(myDomTree),
new StreamResult(new FileOutputStream("results.xml")));
// Use a custom source and result (JDOM)
transformer.transform(new org.jdom.trax.JDOMSource(myJdomDocument),
new
org.jdom.trax.JDOMResult(new org.jdom.Document())); |
As you can see, TRaX provides tremendous flexibility in moving from
various input types to various output types, and in using XSL style sheets
in a variety of formats -- files, in-memory DOM trees, SAX readers, and so
on.
Scratching the surface
A number of other items in TRaX are important, but they are not as commonly
used as those shown here, and there isn't room here to list them all. I
do recommend you check out the TRaX API when the JAXP specification has
included it (something that should happen any day now); it is a rich
and robust API for XML transformations. You can play around with output
properties, set error handling (not only for XSL transformations, but
also for locating input sources) and find a variety of other goodies in the API. Enjoy,
and let us (the expert group) know what you think!
Warnings
Before I wrap up, a warning is in order. In case you read this article
three months from now, download JAXP 1.1, and get compiler and runtime
errors, keep in mind that this article is being written as JAXP 1.1 is
being finalized. As with any early-release piece, things can change as
this article ages -- even as it goes from my laptop into the developerWorks
production process. In other words, the methods and features covered here
are current as I write them, but the JAXP specification is still
somewhat in flux. Bearing that in mind, consider the concepts here important,
yet be prepared for a method or two to undergo a name change or perhaps
even go through a slight alteration in behavior. Still, the core ideas
outlined here will appear in JAXP 1.1 in some form. So count on what is
detailed here to be correct in concept, if not exactly in detail, by the
time JAXP 1.1 goes final in both its specification and reference implementation.
Summary
You now have the lowdown on what's coming in the next version of JAXP.
The public draft of the specification, in its final form, should be available
close to the end of the year 2000. The actual reference implementation
should follow shortly, with all loose ends tied up by the first quarter
of 2001. You'll want to be careful when looking up resources on JAXP
since the current draft of the specification (as of early November 2000) does
not include the TRaX API that I discussed in this article. The specification
is being revamped as I write this, so an updated specification will be
available shortly. For those of you who have been waiting to use JAXP (a fairly wise move
considering the limitations of the 1.0 version), consider this the time
to dive in head first. In my articles and my book, Java and XML,
I gave a rather tenuous endorsement of JAXP 1.0, due to its shortcomings
with regard to SAX 2.0 and DOM Level 2. I'm happily endorsing JAXP 1.1
now as a major step forward. Java and XML developers will find it an indispensable
tool in writing vendor-neutral code for parsing and transforming XML documents.
So check it out, and get your applications in gear.
Resources
About the author  | |  | Brett McLaughlin works as Enhydra strategist at Lutris Technologies and specializes in distributed systems architecture. He is author of Java and XML (O'Reilly). He is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. Along with Jason Hunter, he recently founded the JDOM
project, which provides a simple API for manipulating XML from Java applications. He is also an active developer on the Apache Cocoon project, EJBoss EJB server, and a co-founder of the Apache Turbine project. Brett is currently on the expert group working on the JAXP 1.1 specification and release. You can contact him at brett@newInstance.com. |
Rate this page
|  |