Skip to main content

skip to main content

developerWorks  >  XML  >

Working XML: UML, XMI, and code generation, Part 1

Design XML vocabularies with UML tools

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Rate this page

Help us improve this content


Level: Introductory

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft

31 Mar 2004

In this first article in a new series on UML and XML schema development, Benoît discusses the motivations for modeling XML schema through the use of UML. He also introduces XML Metadata Interchange (XMI) and sketches out a strategy for deriving XML schemas automatically from UML models. Share your thoughts on this article with the author and other readers in the accompanying discussion forum. (You can also click Discuss at the top or bottom of the article to access the forum.)

Since XML has become mainstream, a lot of interest in the design of XML applications has emerged. More specifically, many organizations want to integrate the design of XML applications with the design of their other applications. Adopting one common methodology -- or at least one common set of tools -- is a worthwhile exercise.

As far as XML goes, design activities are centered around the data model. Indeed, because XML is a markup language, it is concerned solely with the organization of information -- unlike, say, the Java language which deals both with the data model (class hierarchy) and data manipulation (methods).

This article is the first in a new series for the Working XML column that will explore the use of UML modeling tools, such as IBM Rational Rose and XSLT, to design XML applications. In this introductory article, I will discuss the basics of data modeling and introduce the techniques that I will cover in the next three articles.

Data modeling

The Concise Oxford Dictionary (Oxford University Press, 2001) has no less than seven definitions of the noun "model." For the purposes of this column series, the following definition is appropriate: "a simplified (often mathematical) description of a system etc., to assist calculations and predictions." The three keywords in this definition are simplified, description, and assist.

According to this definition, a model is a description of a system. This statement is crucial; the model is not the system itself but a formalized representation of the system. In the specific case of XML, the system consists of documents encoded according to a specific vocabulary.

The second aspect of the definition is that a model is a simplified representation. It is not as complex or as rich as the system being modeled. Many systems are designed to tackle complex problems so they are complex by nature. For example, look at the complexity of a vocabulary like DocBook: It is designed for publishing technical books and documentation (the Linux documentation is published in DocBook, among others). Because technical books and documentation are complex, DocBook is very complex (see Resources).

Yet humans are somewhat limited in the amount of information they can process at any one time. When most people work on a complex issue, they like (or need) to break it down into smaller, more manageable issues. Models are built to address that need. A model simplifies a complex system by exposing only some aspects of the system.

The last keyword in the definition is assist. Models are not built in a vacuum, but they serve a very specific purpose: to help the designers reason about a system. A model is not imbued with magical virtues; it is only a tool for achieving a specific goal more efficiently. The goal is never to build a model, but to address the system.

The operative nature of models is closely related to the simplification I mentioned above. To simplify means to choose those elements of the system that are worth including and those that should be discarded. The selection is guided by the goals of the model, such as which calculations and predictions you are trying to assist.

Simplification and modeling

It is difficult to emphasize enough that models are a simplification of an actual system. Again, it is impossible to tackle a complex system unless it is broken down into smaller, simpler elements. In practice, one model is not always enough and a complex system may be represented by a range of models, from simple to complex.

The modeling process may start with a sketch of a system on the legendary napkin (a white board or a regular sheet of paper are good alternatives). The first model is usually very rough, ignoring most aspects of the system other than the few aspects that the designer has identified as essential, either because they are particularly complex or because they are key differentiating factors.

This rough model will be refined into one or more models of increasing sophistication and complexity. Each iteration incorporates more elements from the actual system until all the relevant aspects have been incorporated into the model. Ultimately, you'll reach the implementation data model that defines all the aspects that the system can manage.

With XML, the implementation will be an XML schema. Alternatives include a DTD, RELAX NG, or WSDL (see Resources). Although technical differences between these implementations exist, in this series I will treat them as variations on XML schemas.

The industry generally takes two views on the relationship between the models and the XML schema. Some authors draw a clear line between the design models, typically UML models or entity-relationship models which are supposed to be abstract, and the XML schemas which include lots of implementation details. This distinction promotes a clean separation between the modeling activity and the implementation activity. Modeling is typically done by business analysts, while implementation is the responsibility of technicians. This division of work mimics the division of work between the analyst and the developer in typical application development.

While I think the separation is sensible for programming, I am not sure it is always applicable to XML modeling -- which leads me to the second industry perspective on this relationship. An XML schema is a model of a document and, as you will see in my next article, it is not dramatically more sophisticated than a good UML model. Granted, an XML schema contains a lot of technical information, but it is not uncommon for a UML model to capture almost as much technical information. So I prefer to view the XML schema as part of a continuum of models, from the high-level model to the low-level one.

Viewing the schema as just another model is particularly relevant when you install tools to assist in the modeling, as I will suggest in the remaining articles in this series.

Simplification and graphics

One of the most effective simplifications used in modeling is graphics. The mind finds it easier to work with a graphic than with a long list of complex instructions. Most modeling methodologies are built on a visual language such as UML, entity-relationships, or flow charts.

When it comes to XML schemas, what constitutes the best visual language generally falls into two views. One approach is to use an XML-specific language, the other is to use a more generic modeling language. Products like XML Spy or TurboXML (see Resources) use a custom graphical tree representation to manipulate XML schemas. A visual rendering might look similar to Figure 1:


Figure 1. Visual XML structure
XML Structure

The alternative is to use a standard modeling language, such as UML, for this purpose. Figure 2 is a UML model that is similar to Figure 1:


Figure 2. UML for XML modeling
UML model

Each approach has its benefits and drawbacks. XML-specific symbols are a perfect match for the XML constructs: It is easy to identify an XML sequence, an XML choice, elements, attributes, and more. It is also possible to specify all the technical information in a simple and natural way. Until recently, many designers of XML applications would have recommended this approach because it is simple and effective.

The price to pay is that the modeling and the tools often do not integrate well with the rest of the development effort. While this approach remains suitable for small XML projects, it does not scale well. It is difficult to work with a large, complex model because the visual language offers only one level of abstraction. It is also difficult to work on large projects that combine XML, Java, Web Services, and SQL because everybody else in the team may be using UML.

UML is best suited for medium and large scale projects for two reasons:

  1. UML applies to Java, C++, Python, PHP, SQL, Web services, and just about any other development technology. Its universality reduces the training needs (one language works for everybody), and it is easier to share designs across the team.
  2. UML diagrams can show as much or as little information as necessary, so it is possible to prepare several models of increasing sophistication with the same tool.

The major downside of UML is that it is less friendly when working with the low-level aspects of modeling. For example, it is easy to order the elements of a sequence in a tree, but it is very tricky to do so in UML.



Back to top


UML and XML

I plan to revisit this topic at length in the next few articles. For now, it suffices to say that many mappings are possible between an XML schema and a UML model. UML supports several diagrams, including use case diagrams, package diagrams, sequence diagrams, and activity diagrams.

The most suitable diagram for my purposes here is the class diagram, which represents an object-oriented data model.

Figure 3 is a very simple UML model for a person. It consists of two classes, one for a person's primary data ("person") and one for his or her "address". The rectangle is the symbol for a class and is divided into three parts: class name, attributes, and methods. Because you'll be modeling data rather than behavior, you can ignore the methods.


Figure 3. UML person model
Person model

Relationships between classes are represented through associations. In a model, an association is drawn as a line. The line may be adorned with connectors to differentiate associations. For example, in Figure 3 the solid diamond indicates that the relationship is a composition -- in other words, that instances of the address class can only exist within the context of a person class.

Note that many options are available for mapping UML constructs to XML, with UML attributes being the best illustration. In UML, an attribute is a field attached to a class. In the Java language, only one sensible mapping exists: The attribute becomes a class variable. In contrast, in XML the attribute may map either to a subelement or to a proper XML attribute. I'll revisit this topic in future articles.



Back to top


Round-trip schema derivation

When working with UML models, you can try several different approaches:

  • You can draw them on paper or on white boards.
  • You can use a vectorial drawing tool such as SmartDraw or Omnigraffle (see Resources)
  • You can use a modeling tool such as IBM Rational Rose (see Resources)

For all but the simplest models, you will want to use a modeling tool. At first sight, a modeling tool may appear to be nothing more than a glorified drawing tool, but it offers much more. A modeling tool understands the model and therefore can provide a lot of assistance to the designer. For example, when adding a class to a diagram, it can draw all these relationships automatically.

Automatic derivation

As I have stated already, I believe the XML schema is just a specific rendering of a very detailed model. Therefore, it is essential to derive the XML schema automatically from the UML model.

Looking at a UML model and attempting to code it in XML schema form can be very time consuming and error-prone. Chances are you will miss some elements or attributes, and it's easy to get relationships wrong. Fortunately, this process is easy to automate if you establish a one-to-one mapping between UML constructs and XML schema statements.

You will find a number of tools that can be used to derive schemas from models automatically, including:

  • The Eclipse Modeling Framework (EMF), formerly known as the IBM XMI Toolkit (see Resources). EMF includes a code generator that derives an XML schema from a UML model.
  • IBM Rational Rose, which includes RoseScript, a scripting language that can manipulate a UML model and therefore save it as an XML schema.
  • Velocity, a project from Apache Jakarta, is a template engine that has been used to generate code from UML models.
  • hyperModel is a graphical tool that specializes in UML-to-XML generation.
  • Poseidon for UML has a built-in code generation feature that is easily customized to generate XML schema.
  • Codagen offers sophisticated code generation capabilities in a UML tool.

In this series, I will propose a solution built on XSLT and XML Metadata Interchange (XMI). XMI is a standard format that you can use to export UML models in XML. It was originally designed to allow the exporting/importing of models between different tools, but since it is XML, you can manipulate it in XSLT.

In my work, I have found it very advantageous to work with XMI and XSLT for the following reasons:

  • XMI is an industry standard, supported by all the major modeling tools. Anything built on top of XMI works with these modeling tools.
  • XSLT is an effective language for expressing transformations. It has many constructs that lend themselves well to this task.
  • XSLT is available on all major platforms so it does not limit my choices in any way.
  • The same technique is easily extended to WSDL and other XML languages.

I have another criteria, which may or may not be relevant for you. I work mostly in e-commerce, so the models I work on are a collaborative effort between several companies. Because different companies may adopt different tools, I can't impose one proprietary product on an entire team when I'm working on a collaborative effort. Because XMI is an industry standard, a solution that builds on XMI generally works well for the whole team.

Figure 4 illustrates this process. I write one or two stylesheets to derive the XML schema from the XMI document, for instance from the UML model.


Figure 4. Deriving the schema
Conversion

I may also prepare a stylesheet that implements the reverse procedure: from XML schema to XMI. This stylesheet is particularly useful when working with existing schemas that have not been modeled in UML.



Back to top


Till next time

In this first article in this new series, I reviewed the principles behind document modeling and surveyed how to model an XML schema in UML. More importantly, I showed you tools for generating the XML schema automatically from the UML model. Automatic generation is possible using an XMI and XSLT stylesheet. I will present an example of this stylesheet in the next installment.



Resources



About the author

Photo of Benoit Marchal

Benoît Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. You can contact him at bmarchal@pineapplesoft.com or through his personal site at www.marchal.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top