arrowCIRSS Home arrow Projects arrow

Meeting the Challenge of Language Change in Text Retrieval with Machine Translation Techniques


This project is funded by a Google Digital Humanities Award. The work aims to improve peoples' ability to find information in large collections of books, such as the collection created by the Google Books project.
In particular, we are focusing on historical language change. Google Books contains millions of books in English. But English is a moving target. Fourteenth-Century vernacular is very different from its 20th-Century counterpart. Thus a query issued in modern English will fail to find related middle English documents. People researching the history of a proverb such as many hands make light work or finding literary allusions to the Shield of Achilles (a common example of ekphrasis, a poetic trope) can find historically diverse passages only by issuing queries in many forms and styles.
To improve on this situation, we are using cross-language information retrieval models to inform the problem of retrieving passages from historically diverse corpora. The primary goal of this project is to posit statistical models (and build software that instantiates them) that allow a single query to retrieval relevant information in documents from a wide variety of English historical periods.

Read more at the project site.

Project PI(s)

PI: Miles Efron
Project Contact: Miles Efron
Funded by: Google

Research Area(s)

Socio-technical Data Analytics
Faculty, researchers and students in the Socio-technical Data Analytics Group design, develop, and evaluate new technologies in order to better understand the dynamic interplay between information, pe…

Project Team

Miles Efron (PI)
Peter Organisciak (Graduate Assistant)


Efron, M. (2010). Linear time series models for term weighting in information retrieval. Journal of the American Society for Information Science and Technology, 61(7), 1299-1312. doi:10.1002/asi.v61:7
Read more