 Investigating writers' attitudes by mining a large corpus of books

Bhattacharyya, S., & Bhattacharya Mehta, R. (2014, January). Investigating writers' attitudes by mining a large corpus of books. Poster presented at the Fourth Annual Postdoctoral Research Symposium, Urbana, IL.

Researchers in history or literary studies are often interested in the question of attitudes of writers to some specific subject matter, and seek an efficient discovery mechanism that can identify which texts in a large collection of texts would repay close study for the purpose of exploring this question. Our work-in-progress approaches this problem in a scalable way by combining a search for collocations within the corpus using a list-based approach, with filtering using available bibliographic metadata. Our use case involves investigating the attitudes of French-language and English-language writers towards women’s work in the colonized world. Our approach is to generate separate lists of occurrences (indexed by line number, page number, and book id number) of all occurrences of words relating to womanhood (and a set of close synonyms), of all occurrences of words relating to work (and a set of close synonyms), and of all occurrences of words expressive of attitudes. We then identify, based on these separate lists, instances of co-occurrences of all three items within a determinate proximity window. This list of co-occurrences serves as the basis for the discovery mechanism for identification of relevant texts, as well as for aggregate-level analysis enabling comparative measures.