From R&D to Practice -- Challenges to

Multilingual Information Access in the Real World

David A Evans

Clairvoyance Corporation

5001 Baum Boulevard, Suite 700

Pittsburgh, PA 15213-1854

dae@clairvoyancecorp.com

 

ABSTRACT

 

Despite the remarkable success of cross-language information retrieval (CLIR) and translingual information retrieval (TLIR) systems to perform on a par with monolingual IR systems in research and evaluation contexts, there has been relatively little commercial development (or success) of TLIR systems and applications. This is due, in part, to lack of demand in the marketplace, but also, in perhaps greater measure, to the special requirements that may be associated with TLIR applications -- requirements that are not typically addressed (or assessed) in our research evaluations. In real-world contexts, the demands on a TLIR system may include

 

(a)   automatic or semi-automatic adjustment to proper names and domain-specific terms;

(b)  retrieval of semi-structured information (such as tables); and

(c)   support for non-retrieval-specific applications such as portals, FAQ systems, and text

mining.

 

In addition, in commercial TLIR systems there is a greater need for end-user support, reflected in requirements such as translation or summarization (possibly "gisting") of retrieved information. This presentation reflects on the practical challenges of moving a TLIR system from a laboratory prototype to a commercial product. It also characterizes the state of the market, including demand and trends; reviews several current commercial TLIR systems; and offers thoughts on specific TLIR functions that may emerge as critical features in future commercial applications.