arrowCIRSS Home arrow Events arrow E-Research Roundtable arrow Event Details

E-Research Roundtable - Bandits and Browsing: Data Mining and Network Analysis for Library Collections

Wednesday, April 25, 2012
12:30 - 2:00

242 LIS

Event Details

Session leaders: Harriet Green, English and Digital Humanities Librarian and Assistant Professor of Library Administration, University Library; AND Kirk Hess, Digital Humanities Specialist, University Library; AND Richard Hislop, Economics PhD Candidate, University of Illinois
Description: Our project proposes to conduct network analyses and data regressions on a data set of 22 million items indexed in the University of Illinois Library catalog. Based on the network analyses resulting from this project, we will begin development for an enhanced recommender system for library catalogs and digital libraries that retrieves richer search results from a library collection search based on network analysis of subject relevancy, circulation data of items, and usage data for items that share interrelated subjects. In order to build this test bed for algorithm and functionalities in the recommender system, we are utilizing the advanced computing resources of XSEDE to develop self-optimizing search algorithms and network analyses that would run against the bibliographic and catalog data in the University of Illinois library catalog and digital library indexes. We have created initial prototypes of search algorithms, topic analyses, and network analyses using the English literature collection's 40,000 item sample set. A core algorithm that we initially developed identifies items that are infrequently used, yet have a high degree of topical relevance to other heavily used works in a collection. Another search algorithm we have developed identifies subject relevancy between items in a library collection, through a multi-faceted approach incorporating subject heading correlations, user behavior in circulation transactions, and probability of circulation usage. Based on these and other analyses conducted on the sample data set, we will test the scalability of these search algorithms and network analyses by expanding them to run against full 22 million-item set of the University of Illinois Library catalog data on the Blacklight cluster in XSEDE.

Related People

Related Publications

Palmer, C. L. (2013, May 16). Data Curation and the Reuse Value of Digital Research Data: Meeting the Aims of Multiple Disciplines and Stakeholders. Ed Mignon Distinguished Lecture in Information Science. Lecture conducted from the University of Washington, Seattle, WA. Read more