arrowCIRSS Home arrow Publications arrow Publication Detail

A Comparison of Document, Sentence and Term Event Spaces, In Proceedings of the Joint 21st International Conference on Computational Linguistics (COLING) and the 44th Annual Meeting of the Association for Computational Linguistics (ACL), p601-8, Sydney, Australia

Full APA Reference

Blake, C. (2006). A Comparison of Document, Sentence and Term Event Spaces, In Proceedings of the Joint 21st International Conference on Computational Linguistics (COLING) and the 44th Annual Meeting of the Association for Computational Linguistics (ACL), p601-8, Sydney, Australia.

Publication Abstract

The trend in information retrieval systems  is from document to sub-document retrieval, such as sentences in a summarization system and words or phrases in question-answering system. Despite this trend, systems continue to model language at a document level using the inverse document frequency (IDF). In this paper, we compare and contrast IDF with inverse sentence frequency (ISF) and inverse term frequency (ITF). A direct comparison reveals that all language models are highly correlated; however, the average ISF and ITF values are 5.5 and 10.4 higher than IDF. All language models appeared to follow a power law distribution with a slope coefficient of 1.6 for documents and 1.7 for sentences and terms. We conclude with an analysis of IDF stability with respect to random, journal, and section partitions of the 100,830 full-text scientific articles in our experimental corpus.

See Also URL