Skip to main content

skip to main content

developerWorks  >  Java technology  >

Getting started with JADT, Part 1

Using the Dictionary and Thesaurus API for Java in your Java applications

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Rakesh Midha (midharakesh@in.ibm.com), Software Engineer, IBM Software Labs, Bangalore

17 Sep 2004

The Dictionary and Thesaurus API for Java (JADT) is a class library for accessing linguistic features in your Java applications. In this article, Part 1 of a two-part series, JADT developer Rakesh Midha discusses JADT from the user's point of view and shows how it is used in a Java application. He discusses the various classes and designs involved in JADT, along with features available with it. Part 2 goes into more detail about the architecture and API, and provides guidance for those wishing to implement a JADT driver.

The Dictionary and Thesaurus API for Java (JADT), an API for wordbook features published on alphaWorks, is a standards-based class library used to access linguistic features in Java applications. It provides Java programmers with a transparent Java-centric way to access dictionary and nonstructural words, and information about them. This article, the first of two, will cover using JADT to develop a dictionary- and thesaurus-enriched Java application. The second article provides a detailed view of the architecture and API.

Overview of JADT

JADT is an API used to access the wordbook data from the Java programming language. It provides generic interfaces that can be used in your Java applications, independent of the implementation of a dictionary/data provider. Using JADT, applications written in the Java language have access to definitions, pronunciations, synonyms, antonyms, and so on for a particular word.

JADT's features include:

  • An interface for accessing dictionary, thesaurus, and other services
  • An interface for multilingual conversion
  • A driver-based access protocol
  • APIs for dictionary providers or driver developers
  • Access to backend dictionaries or thesauruses, such as databases, file systems, or XML-based files (provided that their drivers are implemented)

JADT can be a very useful API with wordbook data accessing applications, as it provides simple and standard techniques to do the task. It saves application programmers from the headache of maintaining the data and its access points.

The implementation of this API can be very useful for a range of applications that use multilingual features, localization, and so on. For example, an editor can use this technology for spellchecking or to get suggestions for more appropriate words during editing. Another example use might be to enable an application that requires translations to use a multilingual dictionary and thesaurus.

In addition, JADT provides a language-neutral interface, which makes it useful for a variety of languages and locales. Also, because ADT is written in the Java language, and its sample drivers are implemented in the Java language, it's platform-independent.



Back to top


JADT data structures

JADT includes various components and services, depending upon the feature or particular subsection of JADT being used. The first stop on our tour of JADT is a set of classes and interfaces that offer a generic way to organize data. Because all the services use these data structures, it is very important that JADT users are aware of them.

WordList

The WordList class is just a container for the words. It does not fetch data from the backend resources. WordList is generally used to pass, get, or contain groups of words. In addition, there is an option to find all the words that follow a specific rule. For instance, it currently supports finding words with similar prefixes or suffixes, or words that share some substring. It is different from the WordLister service because it does not get the words from background resources and is just a wrapper class. It can also be used for word collection, as shown in Listing 1:


Listing 1. Usage of Wordlist as collection
 
WordList wordlist=  wordlister.findWithPrefix("un"); 
//Getting collection of words one by one 
wordlist.start(); 
while(wordlist.hasMoreWords()) 
{ 
 Word word=wordlist.getNext(); 
} 
// Using WordList further to narrow down the search 
WordList wordlist2= wordlist.findWithSuffix("ing");

Word

The Word interface represents a unit of language or formation of characters that native speakers can identify. Word wraps the information about the word. The information stored in Word includes word spelling, type, source, pronunciation, and records. Once the word object is obtained from one of the services, its methods can easily be used to fetch information, as shown in Listing 2:


Listing 2. Using word for fetching data
 
Word word=wordlist.getNext(); 
String strName=word.getWord();

DictionaryRecord

The DictionaryRecord object stores additional information for the word. DictionaryRecord is usually used to represent the resultset words obtained from one of the services implemented by JADT Driver. The record stores information like the word, such as its description, type, usage, locale, pronunciation, and so on. Because DictionaryRecord is implemented as a chainedObject, it can be used to obtain multiple sets of word information in the form of chained objects. This technique can be used as a result set of one of the services, as shown in Listing 3:


Listing 3. DictionaryRecord usage
 
DictionaryRecord dr=  dict.getMeaning("dictionary");
dr.start();
while(dr!=null)
{
String strName=dr.getWordName(); 
String pronunciation=dr.getPronunciation(); 
String type= dr.getType(); 
String meaning=dr.getDescription(); 
dr=dr.getNextRecord(); 
} 



Back to top


JADTDriver

As I said earlier, JADT provides driver-based access mechanisms so that the Java application programmer can work independently of the implementation provider. To do so requires a JADT driver that can communicate with the particular data source being accessed. A user's calls are delivered to the data source through one of the driver services, and the results of those statements are sent back to the user.

The implementation provider is responsible for JADTDriver, which in turn is responsible for fetching and providing the required data to you.

JADT comes with two default drivers, for two different kinds of data sources bundled with the JADT API. These two drivers are JADTTextDriver, which is implemented for linguistic data stored in a text format and JADTXMLDriver, which is implemented for linguistic data stored in an XML format



Back to top


Control flow

The JADT API has defined a programming model for accessing the JADTDriver services, and Java application developer has to follow the protocol to use JADTDriver. So the simplest flow of the JADT service access code has to be:

  • Load JADTDriverFactory: JADTDriverFactory will be registered automatically to a driver while loading by calling Class.forName("com.ibm.jadtdrivers.TextDriver.JADTTextDriverFactory");.

  • Get JADTDriverFactory from JADTDriverFactoryManager: JADTDriverFactory is a creational factory for various drivers and is always controlled by JADTDriverFactoryManager. So JADTDriverFactoryManager can be used to obtain JADTDriverFactory by calling JADTDriverFactory fac=JADTDriverFactoryManager.getJADTDriverFactory("JADTTextDriverFactory");.

  • Create JADTDriver: JADTDriver provides access to various services like dictionary, wordbook, spellchecker, wordlister, translator, grammar checker, and anagrammizer. It can be created by calling the createJADTDriver() method of factory:

    Driver dri =fac.createJADTDriver();
    



  • Set driver properties: The driver properties are declared by the driver providers. For example, with Text and XML drivers it is very important that the user set the path to the data directories, which can be done using driver.setProperty("JADTTextDriverDir","c:\\datadir");. If this path is not set, the default value will be picked, which is the path to the driver class.

Now you're set to use the driver services for all the languages supported by the driver.



Back to top


Accessing the dictionary

A dictionary is one of the most important services provided by the JADT driver, and both text and XML drivers support this service. It can be obtained from the driver for a specific language using the following statement:

 
Dictionary dict=driver.getDictionary("english","english");

The getMeaning() method fetches a DictionaryRecord for each word you specify, which contains multiple meanings of the word. Listing 4 shows the dictionary usage:


Listing 4. Dictionary usage

DictionaryRecord dr=  dict.getMeaning("dictionary");
dr.start();
while(dr!=null)
{
String strName=dr.getWordName();
String pronunciation=dr.getPronunciation();
String type= dr.getType();
String meaning=dr.getDescription();
dr=dr.getNextRecord();
}



Back to top


Using WordBook

WordBook is a service component that provides a classified list of related words, providing information about all related words. These words can be related according to usage, origin, sound, and so on.

This service can be accessed from the driver with the following code:

WordBook wordbook=driver.getWordBook("english");

Both text and XML drivers support this service.

The current version of JADT provides interfaces for implementing the following services:

Synonyms
Synonyms are two words that are interchangeable in a given context. There are two methods that provide this service:

  • getSynonyms() is used to get the synonyms
  • isSynonyms() is used to determine whether two words are synonyms

Listing 5 shows the methods in action:


Listing 5. WordBook usage to check synonymns

DictionaryRecord dr=  this.wordbook.getSynonyms("dictionary");
while(dr!=null)
{
String strName=dr.getWordName();
dr=dr.getNextRecord();
}
if(wordbook.isSynonyms("dictionary","lexicon"))
{
/*..*/
}

This same technique is used for the remaining services in this section, so I won't be presenting it for each.

Antonyms
Antonyms are two words that are opposites. Using WordBook, you can check if words are antonyms or you can find all the antonyms of the word. Again, there are two methods that provide this service:

  • getAntonyms() is used to get the antonyms
  • isAntonyms() is used to determine whether two words are antonyms

Hypernyms
Hypernyms are words that refer to broad categories or generic concepts. "Computer" or "fruit" are hypernyms for more specific terms like "Dell" or "banana." You can use WordBook to get all the hypernyms or check if words are hypernyms. There are two methods that provide this service:

  • getHypernym() is used to get the hypernym
  • isHypernym() is used to determine whether two words are hypernyms

Hyponyms
Hyponyms are words that refer to more specific words or concepts. Proper nouns are good examples of hyponyms. "North America" or "Mercedes" are hyponyms for "continent" or "automobile."

You can use WordBook to check whether a word is a hyponym or you can find all the hyponyms to a word. There are two methods that provide this service:

  • getHyponym() is used to get the hyponym
  • isHyponym() is used to determine whether two words are hyponyms

Holonyms
Holonyms are words that name the whole of which a given word is a part. For instance, "hat" is a holonym of "brim" and "crown." There are two methods that provide this service:

  • getHolonym() is used to get the holonym
  • isHolonym() is used to determine whether two words are holonyms

Meronyms
Meronyms are words that name a part of a given word. For instance, "brim" and "crown" are meronyms of "hat." There are two methods that provide this service:

  • getMeronym() is used to get the meronym
  • ismeronym() is used to determine whether two words are meronyms


Back to top


Using SpellChecker

You can use SpellChecker to catch misspelled words. It is a useful tool for editors, IDEs, and other word-processing applications.

You can access SpellChecker by getting the service instance from the driver, as shown below. Both text and XML drivers support this service:

SpellChecker spellchecker=driver.getSpellChecker("english");

JADT SpellChecker provides features to:

  • Find the correctness of a word: Call the check() method of SpellChecker. It returns boolean and takes a word as a parameter:

    System.out.println("Word dictionar is right Spelling "+spellchecker.check
      (new TextWord("dictionar")));
    



  • Find the correct spelling for a word: Call the correct() method of SpellChecker. It returns DictionaryRecord and takes a word as a parameter:

    DictionaryRecord dr=spellchecker.correct(new TextWord("dictionar"));
    if(dr!=null)
    System.out.println("Correct spelling of dictionar is  "+dr);
    



  • Find similar words: Call the suggestSimilar() method of SpellChecker. It returns DictionaryRecord and takes a word as a parameter:

    DictionaryRecord dr=spellchecker. suggestSimilar(new TextWord("dictionar"));
    System.out.println("Words similar to dictionar are : ");
    dr.Start();
    while(dr.hasMoreWords())
    System.out.println(dr.getNext());                 
    



Back to top


Using WordLister

WordLister allows you to get the words from the backend resource. JADT WordLister also provides an option to find all words that follow a certain rule.

This service can be accessed from the driver with the following code:

WordLister wordlister=driver.getWordlister("english"); 

Currently, it supports finding words with:

  • Similar prefixes: Gives a word that starts with the same word criterion:

    WordList dr=  wordlister.findWithPrefix("perf");
    System.out.println("Words with prefix \"perf\" are : ");
    if(dr==null)return;
    dr.Start();
    while(dr.hasMoreWords())
    {
    System.out.println(dr.getNext());
    }
    



  • Similar suffixes: Gives a word that ends with the same word criterion:

    WordList dr=  wordlister.findWithSuffix("ces");
    System.out.println("Words with suffix \"ces\" are : ");
    if(dr==null)return;
    dr.Start();
    while(dr.hasMoreWords())
    {
    System.out.println(dr.getNext());
    }
    



  • That share some substring: Gives a word that contains the same word in it:

    WordList dr=  wordlister.findWithSubstring("tiona");
    System.out.println("Words with substring \"ces\" are : ");
    if(dr==null)return;
    dr.Start();
    while(dr.hasMoreWords())
    {
    System.out.println(dr.getNext());
    }
    



Back to top


Using Anagrammizer

An anagram is a word or phrase spelled by rearranging the letters of another word or phrase. Anagrammizer is used to get words formed by the same characters and can be useful for word game applications. With Anagrammizer, you can determine whether words are anagrams of each other or find all the anagrams to a particular word.

This service can be accessed from the driver using the following statement:

Anagrammizer anagram=driver.getAnagrammizer("english");

There are two methods that provide this service:

  • Anagrammise() is used to get the antonyms
  • isAnagram() is used to determine whether two words are antonyms

These methods can be used as shown in Listing 6:


Listing 6. Anagrammizer usage

DictionaryRecord dr=  this.wordbook.Anagrammise(new TextWord("clear"));
while(dr!=null)
{
String strName=dr.getWordName();
dr=dr.getNextRecord();
}
if(wordbook. isAnagram("clear","clare"))
{
/*..*/
}



Back to top


Using GrammarChecker

GrammarChecker checks the admissible arrangement of words in the sentence and correctness of usage of word in a particular context. Again, this service can be used in publishing and word-processing applications.

This service can be accessed from the driver with the following code:

GrammarChecker grammarchecker=yourdriver.getGrammarChecker("english");

With JADT GrammarChecker, you can:

  • Use the check() method to determine whether the correct grammar is used
  • Use the correct() method to correct the grammar
  • Use the suggestSimilar() method to suggest how to correct the grammar in a specific context


Back to top


Using Translator

Translator is used to convert words or sentences from one language to another. This feature can be used in localization and nationalization implementations. The resource bundle files written in one language can be converted to another using this feature.

This service can be accessed from the driver with the following code:

Translator translator=yourdriver.getTranslator("english","french");

With JADT Translator, you can:

  • Use the translate() method to translate a word to a second language
  • Use the translateSentence() method to translate a sentence to a second language


Back to top


A sample JADT application

JADT is also shipped with a sample application to demonstrate the features of the JADTTextDriver and JADTXMLDriver drivers.

To execute the sample application, call java com.ibm.jadtsample.JADTSampleApplication to open the application, which will look like Figure 1:


Figure 1. Sample application screen
Sample application screen

Make sure that you execute the java com.ibm.jadtsample.JADTSampleApplication command from the directory that your resource files are placed.



Back to top


Summary

After reading this article, you should have a good idea of how the Java API for Dictionary and Thesaurus works from the user's perspective. Specifically, you learned basic JADT structure and how to use various JADT services and components to build a Java application using dictionary, and other word-related features.

In the Part 2 of this series, I'll look at JADT from the dictionary provider's perspective and show how they can implement JADT for their dictionary and word data.



Resources



About the author

Rakesh Midha

Rakesh Midha, is a software engineer with IBM Software Labs, Bangalore. He is currently working on IBM WebSphere Business Components development. He has five years of technical experience in Java and C++ server-side programming on multiple platforms and various relational database systems like DB2 UDB, Oracle, MySQL, and Microsoft SQLServer. His areas of expertise include designing and developing stand-alone and n-tier distributed applications in the field of banking, finance, catalog industry, and order and warehouse management systems. He holds a Bachelor's degree in Electronics Engineering from the Punjab University, Chandigarh, and is the technologist of Dictionary and Thesaurus for Java, launched at IBM alphaWorks.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top