From R&D to Practice -- Challenges to
Multilingual Information Access in the Real
World
David A Evans
Clairvoyance Corporation
5001 Baum Boulevard, Suite 700
Pittsburgh, PA 15213-1854
ABSTRACT
Despite
the remarkable success of cross-language information retrieval (CLIR) and translingual information retrieval (TLIR) systems to
perform on a par with monolingual IR systems in research and evaluation
contexts, there has been relatively little commercial development (or success)
of TLIR systems and applications. This is due, in part, to lack of demand in
the marketplace, but also, in perhaps greater measure, to the special
requirements that may be associated with TLIR applications -- requirements that
are not typically addressed (or assessed) in our research evaluations. In
real-world contexts, the demands on a TLIR system may include
(a) automatic
or semi-automatic adjustment to proper names and domain-specific terms;
(b) retrieval
of semi-structured information (such as tables); and
(c) support
for non-retrieval-specific applications such as portals, FAQ systems, and text
mining.
In
addition, in commercial TLIR systems there is a greater need for end-user
support, reflected in requirements such as translation or summarization
(possibly "gisting") of retrieved
information. This presentation reflects on the practical challenges of moving a
TLIR system from a laboratory prototype to a commercial product. It also
characterizes the state of the market, including demand and trends; reviews
several current commercial TLIR systems; and offers thoughts on specific TLIR
functions that may emerge as critical features in future commercial
applications.