UCDATA
Email questions to
UC DATA.

RESEARCH AREAS > Information Technology > Probabilistic Retrieval of Full-Text Document Collections Using Logistic Regression

Probabilistic Retrieval of Full-Text Document Collections Using Logistic Regression
This project develops and tests new probabilistic approaches to text retrieval. Using the statistical technique of logistic regression, documents are ranked in order of estimated probability of relevance with respect to a query. The methods are subjected to rigorous performance experiments with the collections of documents and queries of the TREC (Text REtrieval Conference) series of conferences sponsored by NIST (National Institute of Standards and Technology) and DARPA (Defense Advanced Research Projects Agency).

Specifically the project investigates logistic regression retrieval in the following areas:
1) Comparison of different logistic regression retrieval models
New theoretical models which combine intelligent Boolean filtering with logistic regression
2) Application of the models to Chinese and Spanish language retrieval
3) Research into methods for cross-language (English-German) retrieval.

This project will advance the progress in modern text and document retrieval by developing sound theoretical models of the retrieval process, models which achieve high performance in experimental tests on millions of documents. The research will contribute to understanding of themechanisms of multilingual retrieval by applying its methodologies to queries and document collections in Chinese and Spanish.


Related Reports
1. Gey, F, and A Chen, "Term Importance in Routing Retrieval," submitted to Information Retrieval, February 1998.
2. Gey, F, and A Chen, "Intelligent Boolean Filtering for Routing Retrieval," in preparation for publication submittal.
3. Gey F, and A Chen, "Phrase Discovery for English and Cross-Language Retrieval," In Proceedings of TREC-6, the Sixth NIST-DARPA Text REtrieval Conference, National Institute for Standards and Technology, Washington, DC (November 19-21, 1997).
4. Chen, A, L Xu, J He, F Gey, and J Meggs, "Chinese Text Retrieval Without Using a Dictionary," in Proceedings of SIGIR97, the 20th annual ACM conference on Research and Development in Information Retrieval, Philadelphia, PA, July 26-31,1997, pages 42-49.
5. Gey, F, A Chen, J He, J Meggs and L Xu, "Term Importance, Boolean Conjunct Training, Negative Terms, and Foreign Language Retrieval, Probabilistic Algorithms at TREC-5," Proceedings of TREC-5, the Fifth NIST-DARPA Text REtrieval Conference, National Institute for Standards and Technology, Washington, DC (November 20-22, 1996), pages 181-190.
6. He, J., Liangje Xu, A. Chen, J. Meggs and F. Gey, "Berkeley Chinese Information Retrieval at TREC-5: Technical Report" Proceedings of TREC-5, the Fifth NIST-DARPA Text REtrieval Conference, National Institute for Standards and Technology, Washington, DC (November 20-22, 1996), pages 191-195.
2007/05/02 10:04:25