Skip to main content

skip to main content

developerWorks  >  XML  >

XML for Data: Reuse it or lose it

XML reuse in the enterprise

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

Kevin Williams (kevin@blueoxide.com), CEO, Blue Oxide Technologies, LLC

01 Mar 2003

One of the great features of XML is that you can easily reuse your designs all the way down to the component level. In this first installment of a three-part series, columnist Kevin Williams provides an overview of XML reuse in enterprise-level solutions, with examples in both XML and XML Schema. You can share your thoughts on this article with the author and other readers in the accompanying discussion forum.

In enterprise-level solutions, one of the most challenging problems facing XML designers is how to design structures that can be reused. In this column, I take a look at some of the historical approaches to reusing serialized data, and then show how XML allows you to break from tradition and take a more flexible approach to your document designs.

In the beginning: pre-XML reuse strategies

Before XML was created, serialized data typically took the form of flat files (setting aside SGML for now, as it was complex enough to be used only in specialized situations). These files could take any one of a number of forms, the most common being the repeating record approach. In these serialized representations, a sample of which is shown in Table 1, each record was defined to be a particular number of characters, and a separate document was required that described how each record looked.

Table 1. Sample flat-file description

Starting position Length Name Format Description
130NamestringThe name of the customer. Right-padded with spaces.
3110Balancenumeric(10,2)The customer's balance. Implied decimal. Left-padded with zeroes.
416Due datedateThe customer's bill due date. MMDDYY.

The output of this typical flat file looks like this:

Kevin Williams     0000010817031103Anne Yastremski     0000007723031303

Several things about this approach made it difficult to use. For instance, without a document describing the content, the file itself was difficult to comprehend. The example here isn't too challenging -- assuming that you know it was created near the beginning of 2003 -- but more complex files would be virtually impossible to parse without the supporting documentation. Also, any change to the source file would break any parser designed to read it. For instance, suppose I added two digits to the example to make the year four digits. Record length would grow from 46 characters to 48, and every parser would have to be modified to take this change into account.

Unless the receiving system happened to be directly compatible with the sending system (such as two COBOL systems using a PICS file to describe the data), there is no easy way to validate the document contents -- the fields essentially have to be validated by hand (that is, with custom code). And finally, the document design is one size fits all. If the receiving system just wanted to know the total outstanding balance across all customers, it would have to receive and discard many bytes of extraneous data.



Back to top


Typical XML reuse strategies

With XML serialization, you can address many of the problems of flat files. In the XML arena, your description of the file's content becomes an XML Schema, and the file itself becomes an XML document, as shown in Listing 1.


Listing 1. Equivalent XML Schema and XML instance

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:element name="customers">
      <xsd:complexType>
         <xsd:sequence>
            <xsd:element ref="customer" maxOccurs="unbounded"/>
         </xsd:sequence>
      </xsd:complexType>
   </xsd:element>
   <xsd:element name="customer">
      <xsd:complexType>
         <xsd:sequence>
            <xsd:element name="name">
               <xsd:simpleType>
                  <xsd:restriction base="xsd:string">
                     <xsd:maxLength value="30"/>
                  </xsd:restriction>
               </xsd:simpleType>
            </xsd:element>
            <xsd:element name="balance">
               <xsd:simpleType>
                  <xsd:restriction base="xsd:float">
                     <xsd:totalDigits value="10"/>
                     <xsd:fractionDigits value="2"/>
                  </xsd:restriction>
               </xsd:simpleType>
            </xsd:element>
            <xsd:element name="dueDate" type="xsd:date"/>
         </xsd:sequence>
      </xsd:complexType>
   </xsd:element>
</xsd:schema>
<customers>
   <customer>
      <name>Kevin Williams</name>
      <balance>108.17</balance>
      <dueDate>20030311</dueDate>
   </customer>
   <customer>
      <name>Anne Yastremski</name>
      <balance>77.23</balance>
      <dueDate>20030313</dueDate>
   </customer>
</customers>
		

Let's see how this approach addresses the problems with flat files that were identified earlier. First, XML documents are relatively self-describing, so even without the XML Schema document you could parse this document with a fairly high degree of confidence. Next, the structured nature of the serialization makes it less intolerant to changes. If you were to add a field, for example, it might be able to accept the new file without modification, depending on how the parser was written.

In addition, document validation is now built in because of the XML Schema. Any system that can understand XML Schema documents (and that's most systems these days) can validate the data against the constraints described there at parse time, with no additional code. Unfortunately, this design approach doesn't address the final concern: the one-size-fits-all problem. Most XML schema design efforts to date have mimicked the approach taken by flat-file designers in the past: build a single structure and attempt to use that structure in all situations. If a serialization contains information that you're not interested in, you have to discard that information -- chewing up valuable bandwidth and processor time. If you want a different serialization, then the designers go back to the drawing board and create a new structure that requires an entirely different parser.

However, the hierarchical nature of XML allows you to adopt a different approach to XML design -- component-level reuse, which I look at next.



Back to top


Component-level reuse: an overview

Think of an XML document as a set of nested containers. The outermost container is the root element. All of its children appear as containers inside it, and so on, until containers only contain actual text values. (Attributes can be thought of as labels on the specific containers describing their contents, but let's not stretch the metaphor too far right now). For example, you might have the structures shown in Figures 1 and 2, each of which contains a customer element:


Figure 1. Customer list example
Customer list example

Figure 2. Invoice example
Invoice example

At first glance, these structures seem incompatible. They describe completely different data concepts (one describes a customer list, the other a single invoice) and contain different information. However, if you build two completely separate structures for these two documents you miss a reuse opportunity. You may be able to reuse the customer element in both of these documents. To do so, however, the two customer elements have to be syntactically and semantically identical. Not only do they have to have the same contents, but the contents must also have the same meaning each time they appear.

Great, you might say, so I've shared the customer element. What's the advantage to doing so? Component-level reuse has several benefits. Let's briefly touch on three of them.

Benefit 1

First, the nature of XML and XML parsers makes it very easy to treat your containers -- or elements -- as black boxes. In other words, you can take an element that you know is reused somewhere else and copy it without worrying about the contents of that element. Let's say, for instance, I have the customer list document shown earlier in Listing 1 and I want to create an invoice for a particular customer. Once I've identified that customer's element in the customer list (by matching on the customer ID or name, for example), I can easily copy that element and all of its children and add it to my invoice document. I don't need to know what other information is embedded in the customer element. The customer element could include detailed address information, a summary of the customer's orders over the past year, or even the customer's favorite flavor of ice cream. To the code, it doesn't matter. Sharing elements leads to simpler, more manageable code, because the contents of the customer element don't have to be deserialized and reserialized to move from one document to the next.

Benefit 2

Second, typical approaches to XML presentation can be greatly leveraged when elements are reused. Because XSLT operates on a per-element basis, guaranteeing that an element is syntactically and semantically equivalent across different source documents enables you to reuse XSLT style sheet fragments. For instance, suppose that whenever I display customer information on my Web site I want the customer name to be bold, the address to be italicized, and so on. If I store this processing code in one place (say, as a template in a customer.xslt file), I can use xsl:include in my style sheets everywhere I need to display the customer information. Then, when the product manager demands that I change the customer name from black text to navy, I can change the customer.xslt file and it will automatically apply everywhere a customer appears on a Web page.

Benefit 3

Third, reusing components in XML designs allows you to reuse serialization code. Since I now know that the customer element will be the same in each of my target documents, I can create a Customer object with a serialize method that returns an XML document fragment. Then whenever I need a customer to appear in an XML document (for instance, as part of my invoice or as part of my customer list), I can use the same code to build the necessary element. This approach reduces code redundancy and greatly simplifies troubleshooting and upgrades.



Back to top


Conclusion

In this column, I looked at how you can reuse XML designs not only at the document level, but also down to the component level. I showed you advantages of this component-level reuse, and how it can simplify code and shorten development cycles. In the next column, I'll identify the different types of reusable components in XML document designs and show you some practical examples of each.



Resources



About the author

Kevin Williams is the CEO of Blue Oxide Technologies, LLC, a company that designs XML and Web service creation software. Visit their Web site at http://www.blueoxide.com. Kevin can be reached for comment at kevin@blueoxide.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top