 | Level: Introductory Michael Russell (MikeRussell@VickiFox.com), Application Architect, Vicki Fox Productions
08 Sep 2004 To customize applications and program products for a specific operational environment, you must modify one or more configuration objects. These configuration objects can take many forms, such as text files, XML files, system registries, or a separate service. Managing the operational environment becomes more complex as the number of configuration objects increases.
I continue the Quality busters series, which looks at common influences
on application quality from the enterprise view of the operational environment and
non-functional requirements. Addressing these influences is a matter of making
tradeoffs, with no single solution solving all the problems. This month I'll discuss
the complexities of application configuration management.
Put what value where?
The operations team notifies the SHEEP Web application team that they are moving
several servers to a new data center. Part of this move requires changing the host name
for two systems. In keeping with data center naming standards, the WebSphere® MQ queue
manager name on these systems will also change. The SHEEP team replies that the change should
be easy; they just have to update the configuration objects on the two systems.
Plans are made, tasks assigned, and the day of the system move arrives. The SHEEP
team member assisting with the move of the systems updates the known configuration files.
The day after the move, the VP of Finance complains that his sales status report
from the Data Warehouse system is not updating.
Operations, the SHEEP team, and the Data Warehouse (DW) team research the cause.
They discover that one of the DW programmers took advantage of the SHEEP application's sales status request and reply messages. Instead of using the SHEEP application's configuration objects, the DW programmer created his own configuration object. The teams overlooked this configuration object during the system move. Once the DW programmer updates the configuration to reflect the new location and name of the request queue, the DW system updates again.
General attributes
You can customize nearly every application by setting one or more
configuration values. Configuration objects store these configuration values.
Configuration values identify the system environment to the program; for example, queue
and queue manager names, remote system identity, user logins and passwords, locale
settings, timeout intervals, and more. Configuration values also identify user settings; for example, which features are enabled or disabled, default values for screen or
processing elements, default user identification, personalization, and more.
Formats
You'll find configuration objects in many formats. The following are the more popular ones.
Text file
The classic configuration object is a text file that contains key value information.
Usually, each line in the text file corresponds to a configuration key-value pair. The key appears on the left, followed by a separator symbol (commonly an equals sign, a colon, or a space), and then the value. Sometimes a special separator line, often called a section heading, is included.
Examples of this format include the INI file found in DOS/Windows and the properties file (java.util.Properties Class) in Java.
Listing 1. Example of key-value format
# Sample configuration file lines
server.queueManagerName = SHEEPQM1
server.requestQueueName = RQSTQ
server.queueTimeout = 1000
|
Advantages:
- Many parsers or access modules are already available.
- Very low overhead is associated with opening and reading the text file.
- The information is human-readable.
- You can edit the contents with simple text editors.
Disadvantages:
- Hierarchical information is difficult to store.
- Repeating information groups are difficult to store.
- Editing can be error prone whenever you have a large number of key-value pairs.
- A string representation stores data and requires conversion to integer or other binary
representations.
XML file
The XML format is growing in popularity, although XML has no standard configuration
file format. It seems every application does it differently. Some cases use element
attributes, while others use only element tags (see the
example in Listing 2). You can name each repeating group with
id or name attributes on a
commonly named element tag or a uniquely named element tag .
Listing 2. Example of the XML format
<!-- Sample configuration file in XML -->
<config>
<server>
<queueManagerName>SHEEPQM1</queueManagerName>
<requestQueueName>RQSTQ</requestQueueName>
<queueTimeout>1000</queueTimeout>
</server>
</config>
|
Advantages:
- Standard XML parsers are available.
- Hierarchical information is easy to store.
- Repeating information groups are easy to store.
- The information is human-readable.
- You can edit the contents with text editors and XML editors.
Disadvantages:
- Increased overhead is associated with opening and parsing the XML document.
- Information is difficult to read when a large number of elements are present.
- Data is stored in a string representation which requires conversion to integer or other binary
representations.
Registry
A registry is a special index object, usually in binary format that efficiently
stores configuration information in a hierarchical structure. Microsoft Windows,
for example, implements a system registry.
Advantages:
- You access the registry from a simple, consistent API. This API hides the location
of the registry from the application program.
- Hierarchical information is easy to store.
- Repeating information groups are easy to store.
- Data is stored in a data-type representation more appropriate for the value's usage.
- All applications running on the system access a single location.
Disadvantages:
- The information is not human-readable.
- The binary format requires special editing programs, preferably tailored to the application, to view and edit the configuration values.
- The configuration values for an application are difficult to extract and store in a form
that can be saved with the application for backup and recovery.
Directory service
A directory service is a set of programs and processes that provides directory
lookup services. An application sends a request (through a message or remote procedure call)
to the directory service, which sends a reply. The directory service can store key-value pairs in a hierarchical structure. An example of a directory service is the
X.500 Directory Services that is accessed using LDAP (lightweight directory access protocol).
Advantages:
- Directory services are separated from the application and can reside on the same or
separate computer systems.
- A consistent API is available to access the service.
- Hierarchical information is easy to store.
- By being system-independent, a single service can be the repository of shared
configuration information for many applications running on many computer systems.
Disadvantages:
- The information is not human readable.
- The service requires special edit programs to view and edit the configuration values.
- Reliability concerns arise if the directory service is not accessible due to the service not running, a broken communications connection, or something else.
- The configuration values for an application are difficult to extract in a form
that can be used for application backup and recovery.
- A local configuration object still must store the naming and
routing information needed by the application to identify the directory service.
Preferences
The Java 2 SDK, Standard Edition, Version 1.4, introduces a new class
called Preferences (java.util.prefs.Preferences). (See Resources.) The standard allows Preferences to
be stored in an implementation-dependent back-end, which could be a file, a LDAP
directory server, the Windows Registry, or some other storage mechanism.
Advantages and disadvantages:
- The advantages are those of the implementation-dependent approach; which is one of
the formats previously listed.
- The disadvantages are, likewise, those of the implementation-dependent approach.
Database
You might store configuration information in a database table. One approach
is a table with a separate column for each configuration element and a single row
in the table. Reading this row retrieves all the configuration information at
once. Another approach is a table with two columns -- a key column and a value
column. Each key-value pair forms a row.
Listing 3. Example of SQL configuration table
CREATE TABLE the_config (
queue_manager_name VARCHAR(32)
NOT NULL DEFAULT('SHEEPQM1'),
request_queue_name VARCHAR(32)
NOT NULL DEFAULT('RQSTQ'),
queue_timeout INTEGER
NOT NULL DEFAULT(1000)
);
|
Advantages:
- Database methods, such as SQL over JDBC, can access the data.
- Parsing of values is unnecessary since information is stored in a more appropriate data representation.
- Many applications running on many computer systems can easily access configuration information.
Disadvantages:
- The information is not directly human-readable.
- The format requires special database query tools or custom edit programs to view and edit the values.
- Reliability concerns arise if the database is not accessible.
- If a schema stores the configuration data separately from the application data, application configuration values might be difficult to extract and save for backup and recovery
purposes.
- A local configuration object must store database access information.
Environment variables
Most operating systems provide support for environment variables or system variables. Each process, when it starts, is loaded with a copy of the system-level environment variables. The process can then change the value of these variables or define additional environment variables. A program can retrieve the value of these environment variables. As a result, environment variables provide a facility for process level management of configuration information.
Listing 4. Example of DOS script with environment variables
set QUEUE_MANAGER_NAME=SHEEPQM4
echo %QUEUE_MANAGER_NAME%
myApplication.exe
|
Advantages:
- You can define environment variables at the process level.
- A parent program in the process can change the environment variable, thus affecting a
child program that starts afterwards.
- The information is generally human-readable since the setting of environment variables often occurs within runtime scripts.
Disadvantages:
- The assignment of environment variables often repeats in most runtime
scripts. This repetition creates a maintenance problem as you must find and update all copies as needed.
- Diagnosing a problem increases in difficulty if another task in a process changes certain configuration information at runtime.
- A programmer must change information since these variables are not in a location that users can typically access.
Command-line parameters
Finally, some configuration information can pass to the program through
command-line parameters. Command-line parameters can override configuration values, such as the configuration object used, the method to find the configuration object, or override specific values, found in other configuration objects. Command-line parameters provide a facility for program level management of configuration information.
Listing 5. Example of command-line parameters
myApplication.exe -qm:SHEEPQM4
|
Advantages:
- Command-line parameters are defined at the program level.
- Programmers can easily force overrides to the default sources of
configuration values.
- Programming is relatively easy since command-line parameters have no external references to files or services.
Disadvantages:
- Maintenance efforts increase because you must search all runtime scripts to find parameter usage.
- Dynamically computed parameters can create diagnostic issues.
- Accessibility diminishes with values stored in sources that only programmers can modify .
A combination of formats
Using a combination of formats is usually a good idea. When you add the ability to
override configuration values, the programmer can selectively test pieces of the
program without the worry of managing configuration objects -- which might be
shared by other users and developers.
A common combination approach goes like this:
- Use a search path, such as the classpath in Java environments,
to search for the configuration object. If no object is found, then attempt reading
from a default location.
- Override the configuration values with the value of the environment variables.
- Override the configuration values of the command-line parameters.
- Log the final configuration values to assist with diagnosing program problems.
Data representation
Configuration objects store the key-value information in one of two
possible data representations: string- or data type-enabled.
String
The key-value text file, XML file, environment variable, and command-line parameter
formats store values in a string representation. The using program must convert from
the string representation to the desired internal representation. While the string
representation makes it easy to edit the configuration object, it does lend itself
to the entry of incorrect values. For example, a user might type the letter 'O'
instead of the number '0' and a text editor cannot detect this.
Data type-enabled
The registry, preferences, and database formats store values in data type-specific representations. For example, numbers are stored in a numeric format, usually integer. This reduces the need for the program to convert values from one representation to another. It also reduces the likelihood of entering incorrectly formed values. However, with this representation, you need special programs to edit the configuration object.
Location
A major decision regarding configuration objects is where to put them. You have
several options:
- Script
- Program directory
- Fixed directory
- Search path
- Separate service
Real world applications often use a combination of locations
involving many configuration objects.
Script
For configuration information that is very program instance-specific, you might find it advantageous to put some information into environment variables or command-line parameters within the script that launches the program. This approach is rarely used because you must search all scripts to find out whether a changing configuration item is referenced within a script.
Program directory
You might place a configuration object in the same directory where the program
itself resides. Finding the configuration object is easier since the program
can determine where it resides and simply check that directory. This approach has
limited ability to share configuration information. Only programs in the same
directory can share the configuration object; programs in another directory
are not able to find it.
Fixed directory
On systems such as UNIX, Windows, or OS/400, with a well-known and stable
directory structure, you might place configuration objects in a well known fixed
directory, such as the root directory or the QGPL library. All programs on the system
can access this fixed directory. As a result, many programs and applications can share the configuration object. In most business environments, the technical support team does not permit the addition of user objects to these fixed directories, so this approach is often discouraged for operational support and security concerns.
Search path
Most systems provide a search path capability, such as the UNIX PATH environment
variable or the Java CLASSPATH variable. By checking each directory in the search
path for the configuration file, the program has more flexibility. This approach
also supports testing better because a tester can put a tailored configuration object
in the search path earlier. This benefit, however, is also its weakness. If, during
operation, an incorrect version of the configuration object is inserted earlier in
the search path, then the program will likely perform differently than expected. This can be difficult to diagnose.
Separate service
Finally, using approaches such as the registry, directory service, or database,
you can separate the configuration object altogether from the application. The
configuration information might even reside on a different computer system. However,
as mentioned before, this approach requires a small local configuration object
that identifies how to access the configuration service. Also, this approach has a
reliability concern if the service becomes inaccessible.
Scope
A similar consideration to location is deciding the scope of the configuration object. That is, how many program components will use the configuration object. Scope includes several levels -- program, process, application, system, or enterprise. Real world applications often combine scope levels involving many configuration objects. These levels are as follows:
- Program. The configuration information is applicable to a specific instance of a program. A session identifier is one example.
- Process. The configuration information is applicable to all threads, units of execution, and program modules that operate within the life of a process. The name of a response queue associated with the process is an example.
- Application. The configuration information is applicable to all programs that comprise an application. The database connection details are an example.
- System. The configuration information is applicable to all programs, independent of the owning application, that reside on a computer system. The computer name and operations notification console are examples.
- Enterprise. The configuration information is applicable to all computer systems
within the enterprise. The names of the enterprise domain name servers (DNS) are an example.
Retrieval frequency
Another important consideration regarding configuration objects is how often a program retrieves values from the object. Your decision will be influenced by how often configuration values might change and by the business rules regarding how up-to-date the program must be. The more frequently a program retrieves configuration values, the more overhead the program will have. If this is the case, the architect should choose a configuration object that has lower overhead associated with value retrieval. The following are commonly encountered retrieval frequencies:
- Program startup. The program reads the configuration object once, when the
program starts. Any changes to the configuration object are ignored until the program
is restarted.
- Periodic refresh. The program re-reads the configuration object on a periodic
basis. Any changes are detected at the next scheduled refresh from the configuration object.
- Triggered refresh. A trigger in the program can force a re-read of the
configuration object. The trigger might be a signal, a special message, a detected change in the configuration object's modify date, or some other event.
- Transactional. The program re-reads the configuration object for each transaction. With this approach, the program guarantees it is using the latest configuration value.
Maintenance
Finally, you must make a decision about who will maintain the configuration
information -- developers, the operations department, or the users. In reality,
you often use a combination of all three. Developers might maintain configuration
values that support the ability to diagnose the programs. Operations might maintain
configuration values that represent the system infrastructure and runtime environment.
Finally, users might maintain personalization, locale, and other usage-oriented
configuration values.
A tailored configuration edit program is generally beneficial for operations and user personnel to use. The tailored program can ensure that
configuration values are correct and meaningful before they are saved to the configuration object. This edit program might be part of the application program itself, similar to the Tools -> Options dialogue in many Windows-based applications.
Considerations
With so many different approaches and formats associated with configuration objects, how do you, as an architect, decide what to use?
As this series will show, it is all a matter of making trade-offs. The discussion above showed some of the advantages and disadvantages associated with each approach and format. In the end, you might end up using several approaches.
The architect might ask some of these before deciding which approach to take.
- What types of configuration items do you need to store in a configuration object?
- Can you dynamically compute the configuration item?
- What is the set of valid values for each configuration item?
- Is a default value associated with the configuration item, or is a user-specified value required?
- Where is enforcement of the configuration item values located -- in the program after reading the value, or in an edit function that updates the value?
- What is the scope of the configuration item?
- What is the appropriate location, format, and retrieval frequency for the configuration item?
- Who maintains the configuration item -- a programmer, operations team, or end-user?
- Can existing tools edit the configuration object (such as a text editor) or must a custom editor?
- If you create a custom editor, can you integrate it into the application, or will it be a stand-alone program?
- Is the configuration item subject to security concerns? For example, storing a database access password in a text file might not be acceptable to security audits. To meet any security requirements, how will you store the configuration item -- plain text, encrypted, or in a secured object?
- Must you synchronize the configuration item across several systems?
- How often will the configuration item be read from the configuration object?
- How should the program behave when a configuration item is not found?
- How should it behave when a configuration object is corrupted?
- Should the application repeat configuration information from another application
or use the other application's configuration object directly?
Repeating the information can result in replication and synchronization issues.
Reusing the other application's configuration object can result in tight coupling and
dependency issues.
 |
In summary
In this column, I presented many approaches to storing and managing configuration
(or customization) information , along with the advantages and
disadvantages associated with each approach. This guide is meant to be
representative and not exhaustive. The goal is to challenge you, as an architect,
to think about the effects on the operational environment and non-functional
requirements brought about by your chosen approach.
Resources - Read the author's other articles in the Quality busters series on developerWorks.
- If you work with Windows-based initialization (INI) files, try the
GetPrivateProfileString(...)
and GetProfileString(...) functions in the Windows SDK.
- If you work with the System Registry, try several APIs, such as
ReqQueryValueEx(...) and
RegOpenKeyEx(...) from the Windows SDK.
- To work with key=value pair text files, get the java.util.Properties Class in the Java SDK. In version 1.4, the Java SDK introduces the
java.util.pref package, which includes the
java.util.pref.Preferences class, for working with
implementation-dependent configuration objects, such as the System Registry
on Windows platforms.
- Find more information about the LDAP protocol and learn to access directory
services with it in this recently updated IBM Redbook, Understanding LDAP - Design and Implementation SG24-4986-01 (June 2004).
- Get an example of using XML for a configuration file in "Java configuration with XML Schema" by Marcello Vitaletti (developerWorks, November 2001).
- Visit these valuable resources on developerWorks:
About the author  | 
|  | Michael Russell has a Bachelors degree in Physics and a Masters
degree in Computer Science. He was a logistics engineer, a
technical services manager, and a certified IT architect at IBM
for nearly 14 years. Michael has experience in Windows, UNIX,
and OS/400 environments and is currently a Web application
architect for a resort company in Orlando. He uses Web technology
for entertainment through his own company, Vicki Fox
Productions (http://www.VickiFox.com).
He can be reached at MikeRussell@VickiFox.com. |
Rate this page
|  |