Skip to main content

skip to main content

developerWorks  >  XML  >

Simplify document handler programs with the SAX parser

A design strategy for SAX parsers

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Advanced

Gianluigi Colaiacomo (gianluigi.colaiacomo@it.ibm.com), Software Developer, IBM

14 Apr 2004

Sometimes the code of a SAX document handler can become cumbersome, poorly structured, and difficult to maintain, especially for complex XML structures with many different elements. This article presents a design strategy that addresses this problem, and, therefore, can improve the quality and the maintainability of your code.

To use the SAX interface to XML files, you need to write a document handler class, which specifies the behavior, for the different XML tags inside the handler methods:

  • startElement
  • endElement
  • characters

This implies that each time a new type of element is added to the XML file, you must change these methods by adding the behavior for the new element. These methods must also be changed if some change or fix is needed for an already existing element. The size and complexity of the document handler class increases with the number of different XML tags, up to the point where readability and maintenance can become very difficult.

In this article, I show you how to improve the design structure by isolating, in a separate class, all the code relative to each XML tag. The document handler class is generalized through the use of the Java reflection mechanism, while an abstract class is used to implement the element classes.

Note: This tip uses Xerces-Java 2, but the concepts are applicable to any SAX-compliant parser.

The design strategy

The first step in the design strategy is to write a generalized document handler. Whenever a tag is encountered in the XML file being parsed, this handler invokes a specific class for the corresponding XML element. In my strategy, this element class is an external file that implements the methods relevant to element processing (for example, startElement, endElement, and characters). To achieve this dynamic calling of external classes by the SAX parser, you can use the Java reflection feature, namely, the Class.forName method, as shown in Listing 1.


Listing 1. The SAX XML generalized document handler
import org.xml.sax.*;
import org.xml.sax.helpers.*;
  
public class SaxParseSample extends DefaultHandler {

String lastName;
  
 static XMLReader parser;  

 public static void main (String args[]) throws Exception
  {

  /**
    * Create a parser.
    */
	try 
	{   
	   parser = XMLReaderFactory.createXMLReader();
	}
	catch (Exception e) 
	{   
	   parser = null;   
	   System.err.println("error: Unable to instantiate parser("+parser+")");
	}
	SaxParseSample handler = new SaxParseSample();
	parser.setContentHandler(handler);
	parser.setErrorHandler(handler);
	parser.parse(new InputSource(args[0]));
  }



  /**
    * Handle the start of an element.
    */
  public void startElement (String uri, String name,String qName, Attributes atts)
  {
    lastName = new String(name); 
   	try 
  	{
	   Class classToRun = Class.forName(name, true, 
	           ClassLoader.getSystemClassLoader()); 
	   XmlElementsInterface etlElement =  
	           (XmlElementsInterface) classToRun.newInstance(); 
	   etlElement.startXmlElement(uri, name, qName, atts);
  	}
  	catch (Exception e) 
  	{  
	   System.out.println( e );
  	};
 
  }

 
  /**
    * Handle the end of an element.
    */
  public void endElement (String uri, String name, String qName)
  {
  	try 
  	{
	   Class classToRun = Class.forName(name, true, 
	           ClassLoader.getSystemClassLoader());
	   XmlElementsInterface etlElement =  
	           (XmlElementsInterface) classToRun.newInstance();
	   etlElement.endXmlElement(uri, name, qName);
  	}
  	catch (Exception e) 
  	{
	   System.out.println( e );
  	};

  }


  /**
    * Handle character data.
    */

  public void characters (char ch[], int start, int length)
  {
   	try 
  	{
	   Class classToRun = Class.forName(lastName, true, 
	           ClassLoader.getSystemClassLoader());
	   XmlElementsInterface etlElement =  
	           (XmlElementsInterface) classToRun.newInstance();
	   etlElement.XmlCharacters(lastName, ch , start, length);
  	}
  	catch (Exception e) 
  	{
	   System.out.println( e );
  	}
  }
}

As you can see in Listing 1, no explicit reference is made to any XML tag. However, the processing is guaranteed by loading and running an instance of the class whose name is contained in the name parameters received from the SAX parser. The programmer is responsible for ensuring this class exists in order to avoid a ClassLoadingException.

By doing this, you obtain a dynamic interpretation of the name of the class to load. However, to get this flexibility, you are paying a small cost in terms of time (see additional performance considerations at the end of this article).

Additionally, in this example, the element's local name identifies the class. This means that it might not work when using multiple namespaces with colliding elements. In such cases, an extended naming convention must be used. For instance, the class names can be defined as namespace_localname or equivalent expressions.

If you summarize the sequence of actions at run time -- for example, for the start of an XML element -- you have:

  1. The parser reads the XML tag for the start of an element and invokes the startElement method of the document handler.
  2. The startElement method loads and runs the class whose local name corresponds to the tag found.
  3. This external class is responsible for the actions to be performed each time a new XML element is started.

The same logic also works for the end of an XML element and the processing of the element content.

In order for this mechanism to work, the external class must implement the XmlElementsInterface interface, which is also used for casting the loaded class. Listing 2 shows the XmlElementsInterface class:


Listing 2. The XmlElementsInterface interface
public interface XmlElementsInterface {
	
	public void startXmlElement (String uri, String name, String qName, 
	        org.xml.sax.Attributes atts) ;
	public void endXmlElement (String uri, String name, String qName) ; 
	public void XmlCharacters (String lastName , char ch[], int start, int length);  
	
}

In Listing 2, only the startXmlElement, endXmlElement, and XMLCharacters methods have been included. However, other methods can be added.

With the interface class shown in Listing 2, you can now implement the external classes for all of the XML elements. For example, for the XML fragment in Listing 3, three external classes -- Commands.class, Comment.class, and Syscommand.class -- need to be created for the three corresponding XML elements:


Listing 3. XML fragment
<Commands>
<Comment>The dir command displays a list of the files in a directory</Comment>
<Syscommand>DIR >> c:\\directory.log</Syscommand>
<Comment>The write command is used to edit the content of a file</Comment>
<Syscommand>write.exe c:\\directory.log</Syscommand>
</Commands>
	

A minimal implementation for each of the classes can be:


Listing 4. Minimal implementation of the external XML element classes
public class Syscommand implements XmlElementsInterface {

	public void startXmlElement (String uri, String name, String qName,
	         org.xml.sax.Attributes atts) 
   		{
   		  	System.out.println(  "Start element " + qName); 		
   		};
     
   	public void endXmlElement (String uri, String name, String qName)  
   		{
   		  	System.out.println(  "End element " + qName);
   		};
   		
   		
   	public void XmlCharacters (String lastName , char ch[], int start, int length)  
   		{
   		  	System.out.println(  "Content of element " + lastName);
   		};

  		
}
	

The definitions for the other element classes -- Commands.class and Comment.class -- are similar.

With these definitions, the parsing of the XML fragment results in the following console log:


Listing 5. Console log created by the default methods
Start element Commands
Content of element Commands

Start element Comment
Content of element Comment
End element Comment

Start element Syscommand
Content of element Syscommand
End element Syscommand

Start element Comment
Content of element Comment
End element Comment

Start element Syscommand
Content of element Syscommand
End element Syscommand

End element Commands
	

You can now change the behavior for one of the elements without changing either the document handler code or the code for the other elements. All the changes are encapsulated in a single class with no interference to or from all the others. For example, you can change the Syscommand so that it executes the commands passed to it. The Syscommand class would now become:


Listing 6. Enhancing Syscommand.java
public class Syscommand implements XmlElementsInterface {

   public void XmlCharacters (String lastName , char ch[], int start, int length) 	
   {
	String COMMAND   = new String(ch,start,length);
	System.out.println(  "Executing command: " + COMMAND);
	try
	{  
	   Runtime rt = Runtime.getRuntime();
	   String[] callAndArgs = { "cmd.exe" ,"/C", COMMAND };
	   try 
	   {
                Process child = rt.exec(callAndArgs);
                int rc = child.waitFor();
                System.out.println("Process exit code is: " + rc);
	   }
	   catch(Exception e) 
	   { 
                System.err.println("Exception " + e ); 
	   }
	}
	catch(Exception e)
	{ 
	   System.err.println("Exception " + e );
	}
   } 		
// . . .
// The other methods remain unchanged
// . . . 
}

The behavior of this system changes when the code is updated inside the Syscommand class. The SAX parser, the document handler, and the external classes for the other elements do not need any changes.

This approach can greatly increase the quality of your code, especially when the structure of the XML file becomes more complex and many other XML elements must be handled.



Back to top


Improvements to the mechanism

In order to fully use the basic approach described above, a couple of limitations need to be addressed. First, what happens if new XML elements are added or an error occurs while an XML tag is being written? In this basic approach, a ClassNotFound exception is raised. To prevent this, or at least to provide meaningful immediate debugging information, it's a good idea to work in conjunction with XML Schema checking (or DTD checking). The elements used must be defined in a schema first; the parser checks this schema, signaling any discrepancies with the parsed file. The syntax for the schema checking can vary slightly among different SAX parsers. For example, in the Xerces-Java 2 implementation you need to add the following lines to the main method of your SaxParseSample class (shown in Listing 1):


Listing 7. Add XML Schema checking to the main method
	...
      parser.setFeature("http://xml.org/sax/features/validation",true);
      parser.setFeature("http://apache.org/xml/features/validation/schema",true);
	...
	

A second limitation is that it is valid only if no data needs to be passed from one element to another, because each time an element is parsed, a new copy of the class is instantiated, and no memory is retained. In the example, the XML fragment contains a sequence of commands to be executed, and each one is independent from the previous one. But what about conditioning the execution of a command to the successful exit from the preceding one? This is not possible with the initial approach above, because when an XML element is being processed it knows nothing about the preceding element.

Therefore, the model must be modified for more complex situations in order to allow parameters to be passed from one element to another.

You can pass information between programs in a number of ways. For example, you can add the use of Java Naming and Directory Interface (JNDI) directory services to your programs.



Back to top


Use JNDI directory services

Directory services provide a way to store and retrieve information, such as serializable objects, in a distributed environment. JNDI is a standard interface to several directory services implementations; it is defined in the javax.naming.directory package, which must be imported into the programs that use it.

Listing 8 refers to a Lightweight Directory Access Protocol (LDAP) implementation. It is also assumed that the environment information needed by this implementation has already been stored in an env hash table.

With the assumptions listed above, you must add the following code to the element classes defined in this article in order to exchange a commandResult string among them:


Listing 8. Changes needed to share an object
try { 	    
    DirContext dirCtx = new InitialDirContext(env);

    String commandResult = new String("Successful");
    dirCtx.bind("cn=result" , commandResult);
	
} catch (NamingException e) {
    System.out.println("Operation failed: " + e);
}
 	    

And the following code can also read it from another element class:


Listing 9. Changes required to read a shared object
try { 	    
    DirContext dirCtx = new InitialDirContext(env);

    String previousResult = (String) dirCtx.get("cn=result" , commandResult);
	
} catch (NamingException e) {
    System.out.println("Operation failed: " + e);
} 	    



Back to top


Performance considerations

Sometimes the Java reflection mechanism that you are using in your design strategy can adversely affect the performance of an application; therefore, keep this in mind if you are adding any performance degradation. For simple applications, the proposed mechanism does not have a major impact. As an example, on a test run, the extra elapsed time for an XML file with 10,000 elements was less than half a second on a 600 MHz processor. In fact, the additional execution time is proportional to the number of elements in the XML file. Therefore, the potential impact is a factor only in complex applications where iterations and recursive invocations are heavily used.



Back to top


Conclusion

In this article, I have shown you how document handler code, developed for handling the different elements in a parsed XML file, can be simplified by creating separate classes for each element. In this way, you can create smaller classes and methods, which makes your code easier to test, debug, and modify, and thereby improves your overall quality and productivity.

The Java reflection methods Class.forName and classToRun.newInstance can be used to create a generalized parsing code that interfaces the element classes defined. The classes for all the elements defined in the XML file must be created and must implement the interface XmlElementsInterface.

Furthermore, in the example, I showed you how to add XML Schema checking inside the generalized parsing code, SaxParseSample, to trap any new or wrong XML elements that could generate a ClassNotFound exception. Also, you saw how to use JNDI directory services to share variables or objects among different pieces of code.



Resources



About the author

Gianluigi Colaiacomo is based in Rome, Italy, and has been working for many years in the IBM Tivoli software development laboratory. His main interests are in software design techniques and automated code generation. You can contact him at gianluigi.colaiacomo@it.ibm.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top