Level: Intermediate Rakesh Midha (midharakesh@in.ibm.com), Software Engineer, IBM Software Labs, Bangalore
17 Sep 2004 The Dictionary and Thesaurus API for Java (JADT) is a class library for accessing linguistic features in your Java applications. In this article, Part 1 of a two-part series, JADT developer Rakesh Midha discusses JADT from the user's point of view and shows how it is used in a Java application. He discusses the various classes and designs involved in JADT, along with features available with it. Part 2 goes into more detail about the architecture and API, and provides guidance for those wishing to implement a JADT driver.
The Dictionary and Thesaurus API for Java (JADT), an API for wordbook features published on alphaWorks, is a standards-based class library used to access linguistic features in Java applications. It provides Java programmers with a transparent Java-centric way to access dictionary and nonstructural words, and information about them. This article, the first of two, will cover using JADT to develop a dictionary- and thesaurus-enriched Java application. The second article provides a detailed view of the architecture and API.
Overview of JADT
JADT is an API used to access the wordbook data from the Java programming language. It provides generic interfaces that can be used in your Java applications, independent of the implementation of a dictionary/data provider. Using JADT, applications written in the Java language have access to definitions, pronunciations, synonyms, antonyms, and so on for a particular word.
JADT's features include:
- An interface for accessing dictionary, thesaurus, and other services
- An interface for multilingual conversion
- A driver-based access protocol
- APIs for dictionary providers or driver developers
- Access to backend dictionaries or thesauruses, such as databases, file systems, or XML-based files (provided that their drivers are implemented)
JADT can be a very useful API with wordbook data accessing applications, as it provides simple and standard techniques to do the task. It saves application programmers from the headache of maintaining the data and its access points.
The implementation of this API can be very useful for a range of applications that use multilingual features, localization, and so on. For example, an editor can use this technology for spellchecking or to get suggestions for more appropriate words during editing. Another example use might be to enable an application that requires translations to use a multilingual dictionary and thesaurus.
In addition, JADT provides a language-neutral interface, which makes it useful for a variety of languages and locales. Also, because ADT is written in the Java language, and its sample drivers are implemented in the Java language, it's platform-independent.
JADT data structures
JADT includes various components and services, depending upon the feature or particular subsection of JADT being used. The first stop on our tour of JADT is a set of classes and interfaces that offer a generic way to organize data. Because all the services use these data structures, it is very important that JADT users are aware of them.
WordList
The WordList class is just a container for the words. It does not fetch data from the backend resources. WordList is generally used to pass, get, or contain groups of words. In addition, there is an option to find all the words that follow a specific rule. For instance, it currently supports finding words with similar prefixes or suffixes, or words that share some substring. It is different from the WordLister service because it does not get the words from background resources and is just a wrapper class. It can also be used for word collection, as shown in Listing 1:
Listing 1. Usage of Wordlist as collection
WordList wordlist= wordlister.findWithPrefix("un");
//Getting collection of words one by one
wordlist.start();
while(wordlist.hasMoreWords())
{
Word word=wordlist.getNext();
}
// Using WordList further to narrow down the search
WordList wordlist2= wordlist.findWithSuffix("ing");
|
Word
The Word interface represents a unit of language or formation of characters that native speakers can identify. Word wraps the information about the word. The information stored in Word includes word spelling, type, source, pronunciation, and records. Once the word object is obtained from one of the services, its methods can easily be used to fetch information, as shown in Listing 2:
Listing 2. Using word for fetching data
Word word=wordlist.getNext();
String strName=word.getWord();
|
DictionaryRecord
The DictionaryRecord object stores additional information for the word. DictionaryRecord is usually used to represent the resultset words obtained from one of the services implemented by JADT Driver. The record stores information like the word, such as its description, type, usage, locale, pronunciation, and so on. Because DictionaryRecord is implemented as a chainedObject, it can be used to obtain multiple sets of word information in the form of chained objects. This technique can be used as a result set of one of the services, as shown in Listing 3:
Listing 3. DictionaryRecord usage
DictionaryRecord dr= dict.getMeaning("dictionary");
dr.start();
while(dr!=null)
{
String strName=dr.getWordName();
String pronunciation=dr.getPronunciation();
String type= dr.getType();
String meaning=dr.getDescription();
dr=dr.getNextRecord();
}
|
JADTDriver
As I said earlier, JADT provides driver-based access mechanisms so that the Java application programmer can work independently of the implementation provider. To do so requires a JADT driver that can communicate with the particular data source being accessed. A user's calls are delivered to the data source through one of the driver services, and the results of those statements are sent back to the user.
The implementation provider is responsible for JADTDriver, which in turn is responsible for fetching and providing the required data to you.
JADT comes with two default drivers, for two different kinds of data sources bundled with the JADT API. These two drivers are JADTTextDriver, which is implemented for linguistic data stored in a text format and JADTXMLDriver, which is implemented for linguistic data stored in an XML format
Control flow
The JADT API has defined a programming model for accessing the JADTDriver services, and Java application developer has to follow the protocol to use JADTDriver. So the simplest flow of the JADT service access code has to be:
- Load
JADTDriverFactory: JADTDriverFactory will be registered automatically to a driver while loading by calling Class.forName("com.ibm.jadtdrivers.TextDriver.JADTTextDriverFactory");.
- Get
JADTDriverFactory from JADTDriverFactoryManager: JADTDriverFactory is a creational factory for various drivers and is always controlled by JADTDriverFactoryManager. So JADTDriverFactoryManager can be used to obtain JADTDriverFactory by calling JADTDriverFactory fac=JADTDriverFactoryManager.getJADTDriverFactory("JADTTextDriverFactory");.
- Create
JADTDriver: JADTDriver provides access to various services like dictionary, wordbook, spellchecker, wordlister, translator, grammar checker, and anagrammizer. It can be created by calling the
createJADTDriver() method of factory:
Driver dri =fac.createJADTDriver();
|
- Set driver properties: The driver properties are declared by the driver providers. For example, with Text and XML drivers it is very important that the user set the path to the data directories, which can be done using
driver.setProperty("JADTTextDriverDir","c:\\datadir");. If this path is not set, the default value will be picked, which is the path to the driver class.
Now you're set to use the driver services for all the languages supported by the driver.
Accessing the dictionary
A dictionary is one of the most important services provided by the JADT driver, and both text and XML drivers support this service. It can be obtained from the driver for a specific language using the following statement:
Dictionary dict=driver.getDictionary("english","english");
|
The getMeaning() method fetches a DictionaryRecord for each word you specify, which contains multiple meanings of the word. Listing 4 shows the dictionary usage:
Listing 4. Dictionary usage
DictionaryRecord dr= dict.getMeaning("dictionary");
dr.start();
while(dr!=null)
{
String strName=dr.getWordName();
String pronunciation=dr.getPronunciation();
String type= dr.getType();
String meaning=dr.getDescription();
dr=dr.getNextRecord();
}
|
Using WordBook
WordBook is a service component that provides a classified list of related words, providing information about all related words. These words can be related according to usage, origin, sound, and so on.
This service can be accessed from the driver with the following code:
WordBook wordbook=driver.getWordBook("english");
|
Both text and XML drivers support this service.
The current version of JADT provides interfaces for implementing the following services:
Synonyms
Synonyms are two words that are interchangeable in a given context. There are two methods that provide this service:
getSynonyms() is used to get the synonyms
isSynonyms() is used to determine whether two words are synonyms
Listing 5 shows the methods in action:
Listing 5. WordBook usage to check synonymns
DictionaryRecord dr= this.wordbook.getSynonyms("dictionary");
while(dr!=null)
{
String strName=dr.getWordName();
dr=dr.getNextRecord();
}
if(wordbook.isSynonyms("dictionary","lexicon"))
{
/*..*/
}
|
This same technique is used for the remaining services in this section, so I won't be presenting it for each.
Antonyms
Antonyms are two words that are opposites. Using WordBook, you can check if words are antonyms or you can find all the antonyms of the word. Again, there are two methods that provide this service:
getAntonyms() is used to get the antonyms
isAntonyms() is used to determine whether two words are antonyms
Hypernyms
Hypernyms are words that refer to broad categories or generic concepts. "Computer" or "fruit" are
hypernyms for more specific terms like "Dell" or "banana." You can use WordBook to get all the hypernyms or check if words are hypernyms. There are two methods that provide this service:
getHypernym() is used to get the hypernym
isHypernym() is used to determine whether two words are hypernyms
Hyponyms
Hyponyms are words that refer to more specific words or concepts. Proper nouns are good examples
of hyponyms. "North America" or "Mercedes" are hyponyms for "continent" or "automobile."
You can use WordBook to check whether a word is a hyponym or you can find all the hyponyms to a word. There are two methods that provide this service:
getHyponym() is used to get the hyponym
isHyponym() is used to determine whether two words are hyponyms
Holonyms
Holonyms are words that name the whole of which a given word is a part. For instance, "hat" is a holonym of "brim" and "crown." There are two methods that provide this service:
getHolonym() is used to get the holonym
isHolonym() is used to determine whether two words are holonyms
Meronyms
Meronyms are words that name a part of a given word. For instance, "brim" and "crown" are meronyms of "hat." There are two methods that provide this service:
getMeronym() is used to get the meronym
ismeronym() is used to determine whether two words are meronyms
Using SpellChecker
You can use SpellChecker to catch misspelled words. It is a useful tool for editors, IDEs, and other word-processing applications.
You can access SpellChecker by getting the service instance from the driver, as shown below. Both text and XML drivers support this service:
SpellChecker spellchecker=driver.getSpellChecker("english");
|
JADT SpellChecker provides features to:
- Find the correctness of a word: Call the
check() method of SpellChecker. It returns boolean and takes a word as a parameter:
System.out.println("Word dictionar is right Spelling "+spellchecker.check
(new TextWord("dictionar")));
|
- Find the correct spelling for a word: Call the
correct() method of SpellChecker. It returns DictionaryRecord and takes a word as a parameter:
DictionaryRecord dr=spellchecker.correct(new TextWord("dictionar"));
if(dr!=null)
System.out.println("Correct spelling of dictionar is "+dr);
|
- Find similar words: Call the
suggestSimilar() method of SpellChecker. It returns DictionaryRecord and takes a word as a parameter:
DictionaryRecord dr=spellchecker. suggestSimilar(new TextWord("dictionar"));
System.out.println("Words similar to dictionar are : ");
dr.Start();
while(dr.hasMoreWords())
System.out.println(dr.getNext());
|
Using WordLister
WordLister allows you to get the words from the backend resource. JADT WordLister also provides an option to find all words that follow a certain rule.
This service can be accessed from the driver with the following code:
WordLister wordlister=driver.getWordlister("english");
|
Currently, it supports finding words with:
- Similar prefixes: Gives a word that starts with the same word criterion:
WordList dr= wordlister.findWithPrefix("perf");
System.out.println("Words with prefix \"perf\" are : ");
if(dr==null)return;
dr.Start();
while(dr.hasMoreWords())
{
System.out.println(dr.getNext());
}
|
- Similar suffixes: Gives a word that ends with the same word criterion:
WordList dr= wordlister.findWithSuffix("ces");
System.out.println("Words with suffix \"ces\" are : ");
if(dr==null)return;
dr.Start();
while(dr.hasMoreWords())
{
System.out.println(dr.getNext());
}
|
- That share some substring: Gives a word that contains the same word in it:
WordList dr= wordlister.findWithSubstring("tiona");
System.out.println("Words with substring \"ces\" are : ");
if(dr==null)return;
dr.Start();
while(dr.hasMoreWords())
{
System.out.println(dr.getNext());
}
|
Using Anagrammizer
An anagram is a word or phrase spelled by rearranging the letters of another word or phrase. Anagrammizer is used to get words formed by the same characters and can be useful for word game applications. With Anagrammizer, you can determine whether words are anagrams of each other or find all the anagrams to a particular word.
This service can be accessed from the driver using the following statement:
Anagrammizer anagram=driver.getAnagrammizer("english"); |
There are two methods that provide this service:
Anagrammise() is used to get the antonyms
isAnagram() is used to determine whether two words are antonyms
These methods can be used as shown in Listing 6:
Listing 6. Anagrammizer usage
DictionaryRecord dr= this.wordbook.Anagrammise(new TextWord("clear"));
while(dr!=null)
{
String strName=dr.getWordName();
dr=dr.getNextRecord();
}
if(wordbook. isAnagram("clear","clare"))
{
/*..*/
}
|
Using GrammarChecker
GrammarChecker checks the admissible arrangement of words in the sentence and correctness of usage of word in a particular context. Again, this service can be used in publishing and word-processing applications.
This service can be accessed from the driver with the following code:
GrammarChecker grammarchecker=yourdriver.getGrammarChecker("english");
|
With JADT GrammarChecker, you can:
- Use the
check() method to determine whether the correct grammar is used
- Use the
correct() method to correct the grammar
- Use the
suggestSimilar() method to suggest how to correct the grammar in a
specific context
Using Translator
Translator is used to convert words or sentences from one language to another. This feature can be used in localization and nationalization implementations. The resource bundle files written in one language can be converted to another using this feature.
This service can be accessed from the driver with the following code:
Translator translator=yourdriver.getTranslator("english","french");
|
With JADT Translator, you can:
- Use the
translate() method to translate a word to a second language
- Use the
translateSentence() method to translate a sentence to a second language
A sample JADT application
JADT is also shipped with a sample application to demonstrate the features of the
JADTTextDriver and JADTXMLDriver drivers.
To execute the sample application, call java com.ibm.jadtsample.JADTSampleApplication to open the application, which will look like Figure 1:
Figure 1. Sample application screen
Make sure that you execute the java com.ibm.jadtsample.JADTSampleApplication command from the directory that your resource files are placed.
Summary
After reading this article, you should have a good idea of how the Java API for Dictionary and
Thesaurus works from the user's perspective. Specifically, you learned basic JADT structure and how to use various JADT services and components to build a Java application using dictionary, and other word-related features.
In the Part 2 of this series, I'll look at JADT from the dictionary provider's perspective and show how they can implement JADT for their dictionary and word data.
Resources - Download the Dictionary and Thesaurus API for Java from
alphaWorks.
- Don't miss Part 2 of this series (developerWorks, September 2004), which looks at word references in detail.
- Find hundreds more Java technology resources on the developerWorks Java technology zone.
- Browse for books on these and other technical topics.
- Interested in test driving IBM products without the typical high-cost entry point or short-term evaluation license? The developerWorks Subscription provides a low-cost, 12-month, single-user license for WebSphere®, DB2®, Lotus®, Rational®, and Tivoli® products -- including the Eclipse-based WebSphere Studio IDE -- to develop, test, evaluate, and demonstrate your applications.
About the author  | 
|  | Rakesh Midha, is a software engineer with IBM Software Labs, Bangalore. He is currently working on IBM WebSphere Business Components development. He has five years of technical experience in Java and C++ server-side programming on multiple platforms and various relational database systems like DB2 UDB, Oracle, MySQL, and Microsoft SQLServer. His areas of expertise include designing and developing stand-alone and n-tier distributed applications in the field of banking, finance, catalog industry, and order and warehouse management systems. He holds a Bachelor's degree in Electronics Engineering from the Punjab University, Chandigarh, and is the technologist of Dictionary and Thesaurus for Java, launched at IBM alphaWorks. |
Rate this page
|