All News Related to Carole L. Palmer

April 10, 2012
CIRSS Researchers at 2012 Research Data Access and Preservation (RDAP) Summit
CIRSS researchers presented two posters at the third annual annual ASIS&T Research Data Access and Preservation (RDAP) Summit (, held 22-23 March 2012 in New Orleans, LA.  Topics explored at this year's summit included data management plans and policies; training of data management practitioners; discovery of research data; data curation service models; sustainability of data management; and data curation.

The two posters report on CIRSS activities on the Data Conservancy project (, funded by NSF and led by partners at Johns Hopkins University.

What Dataset Descriptions Actually Describe: Using the Systematic Assertion Model to Connect Theory and Practice
Karen Wickett, Andrea Thomer, Simone Sacchi, Karen S. Baker, David Dubin
Available at:

Scientific data is encoded and described with the aim of supporting retrieval, meaningful interpretation and reuse. Encoding standards for datasets like FGDC, DwC, EML typically include tagged metadata elements along with the encoded data, suggesting that, per the Dublin Core 1:1 principle, those elements apply to one and only one entity (a specimen, observation, dataset, etc.).  However, in practice vocabularies are often used to describe different dimensions of scientific data collection and communication processes. Discriminating these aspects offers a more precise account of how symbols and the propositions they express acquire the status of “data” and “data content,” respectively.

In this poster we present an analysis of species occurrence records based on the Systematic Assertion Model (SAM) [DWS]. SAM is a framework for describing the encoding and representation of scientific data, bridging the gap between data preservation models and discipline-specific scientific ontologies. The model is intended to be general enough for any scientific domain, and not bound to any particular methodology or field of study. Since species occurrence records are a kind of data that is frequent re-used, migrated across systems and shared they are a good target for analysis.

Integrating Conceptual and Empirical Studies of Data to Guide Curatorial Processes
Carole L. Palmer, Tiffany C. Chao, Nicholas M. Weber, Simone Sacchi, Karen M. Wickett, Allen H. Renear, Karen Baker, Andrea Thomer, & David Dubin

Two research teams within the Data Conservancy ( project are investigating different aspects of scientific data curation. Data Concepts is developing a conceptual model to foster shared understanding of identity conditions and representation levels for data sets. Data Practices is conducting qualitative studies of data production and use in the earth and life sciences, analyzing curation needs, cultures of sharing, and re-use potential across disciplines. This poster will illustrate the integration of results from three phases of research to develop a more comprehensive and practical analysis of fundamental aspects of data curation.
Read more

April 11, 2012
Interview with Carole Palmer on Big Data

GSLIS Professor and CIRSS Director Carole Palmer recently shared her thoughts in the University of Illinois feature, "A Minute With . . .," following the Obama administration's announcement of a $200 million research initiative in "big data" computing:

GSLIS has been at the forefront of data curation education since launching its specialization within the Master of Science degree in 2006, beginning with a focus on the sciences and expanding to include the humanities in 2008. Currently, more than 50 students enroll each year in the Foundations of Data Curation course, with many completing the GSLIS Specialization in Data Curation (
The interview with Professor Palmer was conducted by Dusty Rhodes, news editor for the U of I News Bureau.  Read the full interview at

Read more

May 17, 2011
Palmer participates in NSF funded Workshop on Cyberinfrastructure for Collaborative Science


Carole Palmer, Director of CIRSS, will participate in the Workshop on Cyberinfrastructure for Collaborative Science, an NSF-funded workshop sponsored by and hosted at NESCent, May 18-20, 2011.

This workshop brings together participants with a diverse set of perspectives, background, and experiences on enabling multi-disciplinary research collaborations that often rely heavily on informatics to succeed. These include informatics practitioners, social scientists, technology experts, and biologists. The workshop provides an opportunity for these groups to meet and exchange their experiences and challenges in designing and using cyberinfrastructure to enable research collaborations. The event is designed to facilitate the emergence of new targets for better coordination, and to forge new collaborations into how cyberinfrastructure can enable scientific culture change.


For more information:


Read more

February 9, 2011
CIRSS researchers present at iConference 2011


CIRSS researchers present at iConference 2011:

Data Practices across Disciplines: Informing Collections & Curation

Data Curation Education in Research Centers: Carole L. Palmer (UIUC), Suzie Allard (Tennessee), Mary Marlino (NCAR Library)

Annotation evolution: how Web 2.0 technologies are enabling a change in annotation practices: Simone Sacchi (UIUC)

Expressiveness Requirements for Reasoning about Collection/Item Metadata Relationship: Karen Wickett (UIUC)

Data Repositories: A Home for Microblog Archives?: Tiffany Chao (UIUC)


The iSchools are interested in the relationship between information, people and technology. This is characterized by a commitment to learning and understanding the role of information in human endeavors. The iSchools take it as given that expertise in all forms of information is required for progress in science, business, education, and culture. This expertise must include understanding of the uses and users of information, as well as information technologies and their applications.

CIRSS researchers will be presenting their current work at the 2011 conference, February 8th - 11th.


Read more

November 6, 2012
IMLS Digital Collections & Content team recieves sub-award from the Digital Public Library of America Secretariat
The IMLS Digital Collections & Content (DCC) team has received a $50,000 sub-award from the Digital Public Library of Americas (DPLA) Secretariat to further refine the teams DPLA Beta Sprint prototype. The Beta Sprint prototype was originally developed during Summer 2011 as part of the DPLA Beta Sprint. The prototype was selected as one of six finalist projects showcased during the DPLAs Fall 2011 Plenary Meeting in Washington, D.C.

Bringing together resources from over 1000 cultural heritage collections across the U.S., the prototype builds on the DCC teams knowledge and experiences aggregating metadata records from varied institutions ranging from libraries and archives to museums and historical societies. The new sub-award will allow the DCC team to make further refinements to the prototypes information retrieval algorithms and implement additional layers of interactive functionality that allow users to interact more directly and dynamically with the prototypes data. Read more

October 24, 2011
DLF/DCC DPLA Beta Sprint Effort Presented at DPLA Plenary

CIRSS and the Council on Library and Information Resources’ DLF program presented their submission to the Digital Public Library of America (DPLA) Beta Sprint at the DPLA Plenary meeting, October 21, 2011, in Washington, DC.

The project prototype leverages the Institute of Museum and Library Services’ Digital Collections and Content (IMLS DCC) resource and DLF Aquifer content as a core collection for the DPLA. The IMLS DCC, launched in 2003, is an aggregation of digital collections from libraries, museums, and archives, supported by IMLS and developed through a collaboration between CIRSS and the University of Illinois Library.

The DPLA is envisioned as a large-scale digital library that will “make the cultural and scientific heritage of humanity available, free of charge, to all.” In May, the DPLA Steering Committee announced a “Beta Sprint” to solicit models, prototypes, tools, and interfaces that demonstrate how the DPLA might index and provide access to a wide range of broadly distributed content. In September, an independent review panel met to discuss the 38 Beta Sprint submissions and recommend six of the most promising projects to present at the October Plenary.

The Plenary Meeting, organized by the DPLA Secretariat at the Berkman Center for Internet & Society and hosted by The National Archives, brought together a range of stakeholders in an open forum to present the vision for the DPLA effort, share the best ideas and models submitted to the Beta Sprint, and engage public participation.

“I am really proud of our beta sprint, as it highlights the investment made by IMLS, the DLF community, and hundreds of libraries, museums, and archives to produce digital collections,” said DLF Director Rachel Frick.

CIRSS Director Carole Palmer said, “The sprint was a great chance to experiment with the national aggregation model we developed in the IMLS DCC project. We extended the collections, made some technical advances, and reconceived the design for the DPLA community, learning a lot along the way.”

Read more

October 3, 2012
CIRSS Receives IMLS Award to Develop Site-Based Data Curation Framework for Long-Tail Science
The Institute for Museum and Library Services (IMLS) has awarded $499,919 for the project, Site-Based Data Curation for Small Science, led by CIRSS Director Carole Palmer, with co-Principal Investigators Bruce Fouke, Professor in Geology, Microbiology, and the Institute for Genomic Biology (IGB) at the University of Illinois at Urbana-Champaign; Sayeed Choudhury, Hodson Director of the Digital Research and Curation Center and Associate Dean of Library Digital Programs at Johns Hopkins University; and Ann Rodman, Director of GIS Operations at Yellowstone National Park.

Bringing together experts in data curation, data repositories, geobiology, and research site management, the Site-Based Data Curation (SBDC) project will investigate and test curation policies and procedures to advance the transfer of long-tail digital data collected at Yellowstone National Park (YNP) to the Data Conservancy for preservation and access, and to better coordinate the management of data resources produced at the many scientifically significant sites at YNP. The framework will result in a general model of professional curation processes readily extendible to other national parks and other important research sites, especially cradles of biodiversity such as coral reefs and deep crustal biosphere locations.

The new data curation approaches will be integrated into the curriculum of the Specialization in Data Curation at GSLIS and undergraduate and graduate geobiology courses taught at Illinois, with educational outreach extended to Yellowstone. The education activities will advance data curation workforce expertise in handling complex, cross-disciplinary data and prepare scientific communities to contribute to and take advantage of diverse collections of curated data.

 The SBDC framework is an important step forward in evolving the professional best practices and institutional collaborations needed to build large-scale, interoperable data collections that include high-functioning long-tail data and are responsive to the pressing data needs of practicing research communities and resource management at sites of data production. Read more

September 5, 2012
Carole Palmer to present at Wolfram Data Summit 2012
Carole Palmer, CIRSS Director and Professor of Library and Information Science, will present “The Analytic Potential of Long-Tail Data: Sharable Data and Re-use Value” at Wolfram Data Summit 2012.  

Taking place September 6 – 7  in Washington, DC, Wolfram Data Summit 2012 is an invitation only event that offers leaders of the world's data repositories an opportunity to meet, to share insights into their work, and to discuss challenges and opportunities facing the data community. The third annual summit will place an emphasis on content, rather than infrastructure, in areas such as: data from social media, location-based data, freeing health care data, data narratives, news as data, natural language processing, government and election data, corporate data silos, culturomics, bibliometric data, data conservation, and semantic data.

More information is available here: Read more

October 22, 2012
CIRSS to make strong showing at ASIS&T 2012
CIRSS Faculty, Affiliated Faculty, PhD Students, and Staff will be making yet another strong showing at ASIS&T's annual meeting. Taking place October 26-30, 2012 the ASIS&T Annual Meeting is a primary  venue for disseminating research centered on advances in the information sciences and related applications of information technology.

Find below a list of CIRSS presentations, papers, posters, and workshops:

Unreliable and Uncertain Annotators: Evaluating Rater Quality and Rating Difficulty in Online Annotation Activities
Organisciak Peter, Efron Miles, Fenlon Katrina and Megan Senseney

Identifying Content and Levels of Representation in Scientific Data
Karen Wickett, Simone Sacchi, David Dubin and Allen Renear

Value and Context in Data Use: Domain Analysis Revisited
Nicholas Weber, Karen Baker, Andrea Thomer, Tiffany Chao and Carole Palmer

Tooling the Aggregator's Workbench: Metadata Visualization Through Statistical Text Analysis
Katrina Fenlon, Miles Efron and Peter Organisciak

Enhancing Cultural Heritage Collections by Supporting and Analyzing Participation in Flickr
Jacob Jett, Megan Senseney and Carole Palmer

Combined Methods, Thick Descriptions: Languages of Collaboration on Github
Nicholas Weber

The Data-at-Risk Initiative:  Analyzing the Current State of Endangered Scientific Data
Angela P. Murillo, Cheryl Thompson, Nico Carver, W. Davenport Robertson, Jane Greenberg and William Anderson

Complications in climate data variable naming
Nic Weber, Andrea Thomer, and Gary Strand Read more

March 13, 2013
Carole Palmer to present on priorities for data curation research and education
Professor of Library and Information Science and CIRSS Director, Carole Palmer, will deliver a presentation titled, "Setting Priorities for Data Curation Research and Education", at the University of North Carlolina at Chapel Hill School of Information and Library Science on March 20, 2013.  

"As our data curation research and education initiatives become more established, we are striving to increase impact on practice by sharpening the focus of our programs and building strategic partnerships. The theme of reuse value is now firmly at the center of our studies and extended through the core curriculum in our emphases on research cultures, collections, and representation. Moreover, our projects and programs are increasingly dependent on contributions from domain researchers, data centers, repository developers, and practitioners. I will present an overview of our investigations of data practices across more than a dozen sub-disciplines in the earth and life sciences, and our work on aggregating and modeling cultural heritage collections with developers of the Europeana Data Model. I will also introduce two important projects that represent our next phase of development:  a research collaboration to develop site-based curation principles and processes with geobiologists and resource managers at Yellowstone National Park, and an education initiative providing core curriculum and student field experiences in partnership with the National Center for Atmospheric Research." Read more

May 28, 2013
A Model for Providing Web 2.0 Services to Cultural Heritage Institutions: The IMLS DCC Flickr Feasibility Study in D-Lib
An article by Jacob Jett, Megan Senseney and Carole Palmer reporting the findings of the IMLS DCC Flickr Feasibility Study has been published in this month's issue of D-Lib Magazine: 

The Flickr Feasibility Study, which was launched by the Institute of Museum and Library Services (IMLS) Digital Collections and Content (DCC) project in 2009 to determine how aggregators might provide intermediary services for cultural heritage institutions wishing to engage in Web 2.0 initiatives, shed light on both needs and models for aggregation services. This article provides an overview of the study's findings, including the efficiencies that aggregation services such as the DCC can afford cultural heritage institutions when they act as intermediaries to facilitate Web 2.0 participation. Also discussed are the outcomes of the study's conversations with Yahoo, Inc. representatives regarding aggregators as members of the Commons on Flickr and the complimentary cultural heritage spaces that aggregation services can help their member institutions to create outside of the Commons. Finally, the ample rewards in long-tail community engagement and user-generated metadata that cultural heritage institutions can reap when they expose their collections to Web 2.0 communities, are highlighted.
Jett, Jacob, Megan Senseney, and Carole L. Palmer, "A Model for Providing Web 2.0 Services to Cultural Heritage Institutions: The IMLS DCC Flickr Feasibility Study," D-Lib Magazine, May-June 2013.
Read more

June 3, 2013
Carole Palmer delivers inaugural Ed Mignon Distinguished Lecture in Information Science
CIRSS Director Carole Palmer delivered the inaugurual Ed Mignon Distinguished Lecture in Information Science at the Information School at the University of Washington in May.  The new annual lecture aims to inspire intellectual thinking and foster creativity in the iSchool community.  In her lecture, "Data Curation and the Reuse Value of Digital Research Data: Meeting the Aims of Multiple Disciplines and Stakeholders," Dr Palmer spoke about CIRSS research to identify curation requirements and reuse value indicators across disciplines through our studies of data practices across "long tail" science domains, and discussed recent activities on the CIRSS IMLS Site-Based Data Curation project.

Digital research data are now widely recognized as valuable assets—research resources with tremendous potential for reuse in new and innovative ways. However, realizing this potential will require ready access to extensive bodies of curated data.  Advances in the management of digital data are proceeding apace, but curation services in research libraries and data centers need to extend beyond storage, archiving, and preservation to identification of high-value data and provision of data resources that are fit for new purposes.  Through our studies of data practices across more than a dozen “long-tail” domains in the earth and life sciences, we are identifying curation requirements across disciplines and indicators of reuse value. She will also discuss advances in curation strategies for multiple stakeholders from the Site-Based Data Curation project, a collaboration with geo-biologists, resource managers at Yellowstone National Park, and the Data Conservancy.

Full details at
Read more

August 23, 2013
CIRSS makes strong showing at JCDL 2013
CIRSS PhD students, faculty, and staff made a strong showing at the 2013 Joint Conference on Digital Libraries (JCDL), July 22-26 in Indianapolis. JCDL is a premier international forum for research in digital libraries and associated technical, practical, and social issues.

CIRSS representation at the conference reported on activities of the Digital Collections and Content and Open Annotation Collaboration projects. CIRSS researchers presented two papers and a poster. In addition, JCDL featured two tutorials led by CIRSS researchers and faculty: a technical introduction to the Europeana Data Model, led by Karen Wickett and Katrina Fenlon; and an introduction with working exemplars to the Open Annotation data model, led by Timothy Cole and Jacob Jett.

Addressing diverse corpora with cluster-based term weighting
Peter Organisciak
Available at:

Highly heterogeneous collections present difficulties to term weighting models that are informed by corpus-level frequencies. Collections which span multiple languages or large time periods do not provide realistic statistics on which words are interesting to a system. This paper presents a case where diverse corpora can frustrate term weighting and proposes a modification that weighs documents according to their class or cluster within the collection. In cases of diverse corpora, the proposed modification better represents the intuitions behind corpus-level document frequencies.

Local histories in global digital libraries: Identifying demand and evaluating coverage
Katrina Fenlon & Virgil E. Varvel, Jr.
Available at:

Digital collections of primary source materials have potential to change how citizen historians and scholars research and engage with local history. The problem at the heart of this study is how to evaluate local history coverage, particularly among large-scale, distributed collections and aggregations. As part of an effort to holistically evaluate one such national aggregation, the Institute of Museum and Library Services (IMLS) Digital Collections and Content (DCC), we conducted a national survey of reference service providers at academic and public libraries throughout the United States. In this paper, we report the results of this survey that appear relevant to local history and collection evaluation, and consider the implications for scalable evaluation of local history coverage in massive, aggregative digital libraries.

Flickr feedback framework: A service model for leveraging user interactions
Jacob Jett, Megan Senseney, & Carole L. Palmer
Available at:

It has been well documented that cultural heritage institutions can enhance their metadata by sharing content through popular web services such as Flickr. Through the Flickr Feasibility Study, the IMLS Digital Collections and Content project examined how an aggregation service can facilitate participation of cultural heritage institutions in popular web services. This poster presents a proposed feedback framework through which an aggregation service can facilitate and increase the impact of Web user interactions with shared cultural heritage collections through direct metadata enhancement and user analysis.

The Europeana Data Model and Collections (tutorial)
Karen Wickett, Valentine Charles, & Katrina Fenlon

This tutorial provides a technical introduction to the Europeana Data Model and explores the role that collections play in adding value to digital libraries by 1) supporting the information seeking activities of system users, 2) allowing users to build and curate their own collections of resources, and 3) supporting administrative management of resources and metadata. Participants will gain a better understanding of conceptual data modeling, structured collection description, and collection metadata. The tutorial will conclude with a discussion of practitioners’ experience with items and collections in a digital library context and next steps for collection modeling research.

Using Open Annotation (tutorial)
Timothy Cole, Robert Sanderson, Thomas Habing, & Jacob Jett

This tutorial introduces digital library, humanities and science computing, and semantic web developers, project managers, and experimenters to: the essential components of the Open Annotation (OA) data model and specification; working exemplars of how the OA data model has been applied to various annotation use cases; helpful links, resources, guides and libraries for implementing the OA data model. Read more

September 12, 2013
Carole Palmer to deliver keynote at meeting of the Research Data Alliance
Carole Palmer, Director of CIRSS and Professor at GSLIS, will deliver a keynote address at next week’s second biannual plenary meeting of the Research Data Alliance (RDA). The three-day meeting, September 16-18, will be held in the National Academy of Sciences in Washington, D.C., and will involve conversations between top leaders from the White House and US science agencies, and their international colleagues.
Palmer’s keynote, “Fueling and Transforming Evidential Cultures of Research”, will be presented during a session on the Benefits and Possibilities of Open Data Sharing, chaired by CIRSS affiliated researcher Sayeed Choudhury, Associate Dean for Research Data Management at the Johns Hopkins University.

More information:
Read more

September 25, 2013
Palmer named Outstanding Information Science Teacher by ASIS&T
[As announced on the GSLIS web site:

Carole Palmer (PhD ’96), director of the Center for Informatics Research in Science and Scholarship (CIRSS) and professor at the Graduate School of Library and Information Science at Illinois, has received the 2013 Thomson Reuters Outstanding Information Science Teacher Award given by the Association for Information Science and Technology (ASIS&T).

A member of the GSLIS faculty for 18 years, Palmer has made a significant impact in the lives and careers of the many students she has mentored. She has been integral in the development of new courses and educational opportunities at GSLIS and is highly respected for the amount of time and care she takes with students. She is frequently named to the List of Teachers Ranked as Excellent issued by campus each semester.

“I am very fortunate to be at a school that attracts such talented students, who are truly partners in learning. They are already on the path to becoming accomplished professionals when they arrive at GSLIS. My job is easy—help them see how interesting, exciting, and important it is to be working in, but also building up, the information professions,” said Palmer.

Palmer has developed and taught a number of courses for both master’s and doctoral students, including Use and Users of Information, Foundations of Data Curation, Information Transfer and Collaboration in Science, and Knowledge Studies for Information Science, among others. Her role as principal investigator on a number of educational grants since 2006 accelerated the development of a GSLIS master’s degree in biological informatics as well as a specialization in data curation.

“Carole is richly deserving of this award. She is committed to preparing her students as leaders in research and practice in information science and to leading the field in advancing preparation for new roles for information professionals,” said Linda C. Smith, GSLIS professor and associate dean for academic programs.

Palmer blends dedication to her students with an astute understanding of the current issues surrounding digital information systems and services, especially in new areas related to the collection and preservation of research data. She has built an impressive research portfolio, which has been funded by the National Science Foundation, the Institute for Museum and Library Services, and The Andrew W. Mellon Foundation, among others.

In addition, Palmer is recognized as a national leader in the field of data curation, often invited to speak on issues surrounding education and workforce development. She serves on the Study Committee on Future Career Opportunities and Educational Requirements for Digital Curation convened by the National Academy of Sciences. She also presents and publishes prolifically in the area.

“Carole has always had her eyes on the future of information science and its potential for addressing the great challenges of the twenty-first century. The leadership and achievements of her students, working in many different roles and institutions, is testimony not only to her skill and influence as a teacher but to her deep understanding of those possibilities,” said Allen Renear, GSLIS interim dean.

GSLIS doctoral candidate Nicholas Weber, who nominated Palmer for the award, notes that she is an “exemplar educator, a passionate and committed mentor and above all else, a profoundly kind and generous human being that has dedicated a significant portion of her own career to the advancement of others. She has the unique ability to motivate people to achieve their absolute maximum potential and has had a profound impact on my career through her mentorship and the generous sharing of her own knowledge and expertise in this field. Her guidance, patience and dedication have helped me become a better researcher and a better instructor.”

Palmer will be presented with the award at the 2013 ASIS&T Annual Meeting, which will be held November 1-5 in Montreal, Canada.

ASIS&T previously honored two other GSLIS faculty members with the Outstanding Information Science Teacher award: the late F. W. Lancaster, who served on the GSLIS faculty between 1970 and 1992 and was the first recipient of the award in 1980, and Linda C. Smith, who was honored in 1987. Read more

October 2, 2013
Palmer, Weber to present at John Deere Big Data Summit
[As announced on the GSLIS web site 9/30/13:]

Carole Palmer, GSLIS professor and director of CIRSS, and Nicholas Weber, GSLIS doctoral candidate, will share their expertise in data curation at the upcoming John Deere Big Data Summit, which will be held in Champaign on October 1 and 2. The summit is intended to bring together analytic and big data thought leaders from inside and outside the company to showcase cutting edge academic thinking, applications, and real-life examples.

Palmer will present, "Data Curation: Investing in the Reuse Value of Digital Data":

Digital research data are now widely recognized as valuable assets—research resources with tremendous potential for reuse in new and innovative ways. Advances in the storage, archiving, and preservation of digital data are proceeding apace, but curation services are needed that extend to the identification of high-value data and provision of data resources fit for new purposes. In this presentation I will discuss our studies of data practices in the sciences, focusing on indicators of reuse value and curation approaches for data consumers vs. data producers. We will also consider the broader implications of curatorial awareness on the cultures of research operations and for institutions committed to investing in high-value, reusable data resources.

Weber will present, "Curating and Profiling Enterprise Data @ John Deere":

This presentation summarizes the findings of a pilot project between John Deere & Co. and the Center for Informatics Research in Science and Scholarship (CIRSS). This work was focused on gathering requirements for the development of new data curation infrastructures and services to support the analysis of big data, as well as the sharing, reuse and sustained archiving of "small data" produced at John Deere.

"The data curation research taking place at CIRSS is critical to the effectiveness and competitive success of twenty-first century corporations. We are very pleased to be participating in this summit," said Allen Renear, GSLIS interim dean. Read more

October 16, 2013
Digital Humanities Data Curation Institute workshop underway at MITH
The second in a series of Digital Humanities Data Curation workshops is currently underway at the Maryland Institute for Technology in the Humanities (MITH) at the University of Maryland (UMD). The series of three-day intensive workshops is hosted by Digital Humanities Data Curation (DHDC), a collaborative research project supported by CIRSS, MITH, and Northeastern University that seeks to develop strong data curation practices within the digital humanities community through the workshop series as well as an online learning resource, the DH Curation Guide.

Instructors Trevor Muñoz (MITH), Julia Flanders (Northeastern University), and Dorothea Salo (University of Wisconsin-Madison) will guide 20 participants through lectures and hands-on exercises, providing them with relevant skills and techniques, such as modeling humanities data for sustainable computational research, mitigating risks to data, developing and implementing data management plans, and evaluating the tools and systems that support data curation.

This week's workshop extends the curriculum originally developed for DHDC's inaugural workshop, held in June 2013 at the Graduate School of Library & Information Science at the University of Illinois, Urbana-Champaign. Participants will be exposed to enhanced examinations of humanities data curation, including a diverse set of practical exercises and analyses of humanities metadata and metadata systems. During the workshop, participants will have the opportunity to implement their new data curation skills in a case study presented by Kari Kraus, Associate Professor in the College of Information Studies and the Department of English at UMD, and will also be encouraged to share information about their own projects and humanities data, including the curation challenges that arise in their institutions or as part of their individual research.

The third workshop in the series is scheduled for May 2014 in Boston. Applications for participation will be available online in the coming months. To receive information about future workshops and other DHDC events, subscribe to the mailing list, and keep up with this week's DHDC conversations on Twitter by following @DHCuration or searching #dhcuration.

Digital Humanities Data Curation is a project of the Maryland Institute for Technology in the Humanities, the Women Writers Project at Northeastern University, and the Center for Informatics Research in Science and Scholarship. This workshop series is generously funded by an Institute for Advanced Topics in the Digital Humanities grant from the National Endowment for the Humanities. Read more

November 6, 2013
CIRSS makes strong showing at ASIS&T 2013
CIRSS students and faculty made a strong showing at the 2013 American Society for Information Science and Technology (ASIS&T) meeting, November 1-5 in Montreal. ASIS&T is a premier annual conference, bringing together esearch on advances in the information sciences and related applications of information technology. This year’s theme was "Beyond the Cloud: Rethinking Information Boundaries".
In addition to the numerous presentations offered by CIRSS researchers, Carole Palmer, Director of CIRSS and GSLIS Professor, was honored with the 2013 Thomson Reuters Outstanding Information Science Teacher Award. Palmer received the award at the Annual Luncheon on November 5.

Finding Information in Books: Characteristics of Full-Text Searches in a Collection of 10 Million Books
Craig Willis & Miles Efron

Searching large collections of digitized books is a relatively new area in information-seeking and retrieval research, made possible by initiatives such as Google Books and the HathiTrust Digital Library. Traditionally, book search has relied exclusively on descriptive metadata, either in online library catalogs or bookstores. Today, the availability of large full-text book collections is transforming how users search and interact with information in books. But the characteristics of these changes are unknown. For this paper, we analyzed query logs from the HathiTrust Digital Library full-text search engine. We analyzed one year of full-text query logs to better understand the types of queries that users are issuing to full-text book collections. We manually classified a random sample of 600 queries to develop a taxonomy of book search query types. We found that users are beginning to search for information in books instead of searching for books. Searches still largely follow bibliographic models, but, as expected, new types of searches are beginning to take advantage of full-text capabilities. Additionally, comparing the results of our query log analysis to searches in other domains we found similar search patterns including short queries, sessions with only a few queries, and users viewing only a few pages of results per query. This study is the first step in a broader research agenda intended to improve searching for information in books.
Building a Framework for Site-Based Data Curation
Carole Palmer, Andrea Thomer, Karen Baker, & Karen Wickett
Exploring Evaluative Methods for Large-Scale Local History
Katrina Fenlon
Specialization in Data Curation: Preliminary Results from an Alumni Survey, 2008-2012
Cheryl Thompson, Karen Baker, Carole Palmer, & Megan Senseney
Read more

November 19, 2013
CIRSS releases white paper on modeling collections for digital aggregation and exchange environments
CIRSS is pleased to announce the public release of “Modeling Cultural Collections for Digital Aggregation and Exchange Environments,” a collaborative white paper developed by the Center for Informatics Research in Science and Scholarship (Wickett, Fenlon, Palmer, Jett) in cooperation with key developers of EDM, the Europeana Data Model (Isaac, Doerr, Meghini).  

The report outlines a formal extension of EDM that explicitly accommodates representation of collections and collection/item relationships and reports on the outcomes of the collaboration – use cases, requirements, and recommendations for modeling collections in exchange and aggregation environments.  

Collections are an important aspect of institutional identity for the organizations that invest in their curation, digitization and public access. Collections can also function as core constructs in information organization systems, providing technical capabilities for retrieval and evaluation of content within large aggregations. Perhaps most importantly, collection structures provide the organizational and intellectual context important to users for interpreting the relevance and significance of individual items for their purposes.

Guiding requirements for modeling and representing collections state that (1) collections are treated as distinct, individual resources within the aggregation, with an identifier and a description that documents properties of the collection as a whole; (2) item-level entities are explicitly linked to collection-level entities, and (3) the set of properties (i.e. schema) describes collections in ways that support users and
administrators, especially institutional and contextual properties.

The full text is available via IDEALS at
Read more

December 5, 2013
CIRSS researchers to present at American Geophysical Union (AGU) Fall Meeting
CIRSS researchers will make a strong showing as they share their expertise in scientific data curation at next week’s 46th annual Fall Meeting of the American Geophysical Union (AGU). The meeting, December 9-13 in San Francisco, California, is the largest worldwide conference in the geophysical sciences, gathering more than 24,000 Earth and space scientists, educators, students, and other leaders.

Representing CIRSS at this year’s AGU conference are Carole Palmer, GSLIS professor and director of CIRSS, and CIRSS PhD students Karen Baker and Andrea Thomer. Palmer and Baker have been invited to to give presentations on earth and space science informatics. Palmer’s talk, part of the Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science session, will present research from the CIRSS project Site-Based Data Curation at Yellowstone National Park (SBDC). Baker will present her research into data management issues and strategies originating from within long-term research communities, as part of the meeting’s session on Data Stewardship in Theory and in Practice.


Advancing Site-Based Data Curation for Geobiology: The Yellowstone Exemplar (Invited presentation by C. L. Palmer)
C. L. Palmer, B. W. Fouke, A. Rodman, G. S. Choudhury

While advances in the management and archiving of scientific digital data are proceeding apace, there is an urgent need for data curation services to collect and provide access to high-value data fit for reuse. The Site-Based Data Curation (SBDC) project is establishing a framework of guidelines and processes for the curation of research data generated at scientifically significant sites. The project is a collaboration among information scientists, geobiologists, data archiving experts, and resource managers at Yellowstone National Park (YNP). Based on our previous work with the Data Conservancy on indicators of value for research data, several factors made YNP an optimal site for developing the SBDC framework, including unique environmental conditions, a permitting process for data collection, and opportunities for geo-located longitudinal data and multiple data sources for triangulation and context. Stakeholder analysis is informing the SBDC requirements, through engagement with geologists, geochemists, and microbiologists conducting research at YNP and personnel from the Yellowstone Center for Resources and other YNP units. To date, results include data value indicators specific to site-based research, minimum and optimal parameters for data description and metadata, and a strategy for organizing data around sampling events. New value indicators identified by the scientists include ease of access to park locations for verification and correction of data, and stable environmental conditions important for controlling variables. Researchers see high potential for data aggregated from the many individual investigators conducting permitted research at YNP, however reuse is clearly contingent on detailed and consistent sampling records. Major applications of SBDC include identifying connections in dynamic systems, spatial temporal synthesis, analyzing variability within and across geological features, tracking site evolution, assessing anomalies, and greater awareness of complementary research and opportunities for collaboration. Moreover, making evident the range of available YNP data will inform what should be explored next, even beyond YNP.Like funding agencies and policy makers, YNP researchers and resource managers are invested in data curation for strategic purposes related to the big picture and efficiency of science. For the scientists, YNP represents an ideal, protected natural system that can serve as an indicator of world events, and SBDC provides the ability to ask and answer broader research questions and leverage an extensive store of highly applicable data. SBDC affords YNP improved coordination and transparency of data collection activities, and easier identification of trends and connections across projects. SBDC capabilities that support broader inquiry and better coordination of scientific effort have clear implications for data curation at other research intensive sites, and may also inform how data systems can provide strategic assistance to science more generally.

Enabling Long-Term Earth Science Research: Changing Data Practices (Invited presentation by K. S. Baker)
K. S. Baker

Data stewardship plans are shaped by our shared experiences. As a result, community engagement and collaborative activities are central to the stewardship of data. Since modes and mechanisms of engagement have changed, we benefit from asking anew: "Who are the communities?" and "What are the lessons learned?". Data stewardship with its long-term care perspective, is enriched by reflection on community experience. This presentation draws on data management issues and strategies originating from within long-term research communities as well as on recent studies informed by library and information science. Ethnographic case studies that capture project activities and histories are presented as resources for comparative analysis. Agency requirements and funding opportunities are stimulating collaborative endeavors focused on data re-use and archiving. Research groups including earth scientists, information professionals, and data systems designers are recognizing the possibilities for new ways of thinking about data in the digital arena. Together, these groups are re-conceptualizing and reconfiguring for data management and data curation. A differentiation between managing data for local use and production of data for re-use remotely in locations and fields remote from the data origin is just one example of the concepts emerging to facilitate development of data management. While earth scientists as data generators have the responsibility to plan new workflows and documentation practices, data and information specialists have responsibility to promote best practices as well as to facilitate the development of community resources such as controlled vocabularies and data dictionaries. With data-centric activities and changing data practices, the potential for creating dynamic community information environments in conjunction with development of data facilities exists but remains elusive.


Two-Stream Model: Toward Data Production for Sharing Field Science Data (Presented by K. S. Baker)
K. S. Baker, C. L. Palmer, A. K. Thomer, K. Wickett, T. DiLauro, A. E. Asangba, B. W. Fouke, G. S. Choudhury

Scientific data play a central role in the production of knowledge reported in scientific publications. Today, data sharing policies together with technological capacity are fueling visions of data as open and accessible where data appear to stand-alone as products of the research process. Yet, guidelines and outputs are constantly being produced that impact subsequent work with the data, particularly in field-oriented, data-rich earth science research. We propose a model that focuses on two distinct yet intertwined data streams: internal-use data and public-reuse data. Internal-use data often involves a complex mix of processing, analysis and integration strategies creating data in forms leading to the publication of papers. Public-reuse data is prepared with a more standardized set of procedures creating data packages in the form of well-described, parameter-based datasets for release to a data repository and for reuse by others. While scientific researchers are familiar with collecting and analyzing data for publication in the scientific literature, the second data stream helps to identify tasks relating to the preparation of data for future, unanticipated reuse. The second stream represents an expansion in conceptualization of data management for the majority of natural scientists from a publication metaphor to recognition of a release metaphor (Parsons and Fox 2012). A combined dual-function model brings attention to some of the less recognized barriers that impede preparation of data for reuse. Digital data analysis spawns a multitude of files often assessed while in use, so for reuse of data, scientists must first identify what data files to share. They must also create robust data processes that frequently involve establishing new distributions of labor. The two-stream approach creates a visual representation for data generators who now must think about what data are most likely to have value not only for their work but also for the work of others. Development of this approach is part of a collaborative project studying site-based data curation in geobiology for geologists, geochemists, and microbiologists at Yellowstone National Park.

How Workflow Documentation Facilitates Curation Planning (Presented by A. K. Thomer)
A. K. Thomer, K. Wickett, K. S. Baker, T. DiLauro, A. E. Asangba

The description of the specific processes and artifacts that led to the creation of a data product provide a detailed picture of data provenance in the form of a workflow. The Site-Based Data Curation project, hosted by the Center for Informatics Research in Science and Scholarship at the University of Illinois, has been investigating how workflows can be used in developing curation processes and policies that move curation upstream in the research process. The team has documented an individual workflow for geobiology data collected during a single field trip to Yellowstone National Park. This specific workflow suggests a generalized three-part process for field data collection that comprises three distinct elements: a Planning Stage, a Fieldwork Stage, and a Processing and Analysis Stage. Beyond supplying an account of data provenance, the workflow has allowed the team to identify 1) points of intervention for curation processes and 2) data products that are likely candidates for sharing or deposit. Although these objects may be viewed by individual researchers as "intermediate" data products, discussions with geobiology researchers have suggested that with appropriate packaging and description they may serve as valuable observational data for other researchers. Curation interventions may include the introduction of regularized data formats during the planning process, data description procedures, the identification and use of established controlled vocabularies, and data quality and validation procedures.We propose a poster that shows the individual workflow and our generalization into a three-stage process. We plan to discuss with attendees how well the three-stage view applies to other types of field-based research, likely points of intervention, and what kinds of interventions are appropriate and feasible in the example workflow.

Research Problems in Data Curation: Outcomes from the Data Curation Education in Research Centers Program (Presented by C. L. Palmer)
C. L. Palmer, M. S. Mayernik, N. Weber, K. S. Baker, K. Kelly, M. R. Marlino, C. A. Thompson

The need for data curation is being recognized in numerous institutional settings as national research funding agencies extend data archiving mandates to cover more types of research grants. Data curation, however, is not only a practical challenge. It presents many conceptual and theoretical challenges that must be investigated to design appropriate technical systems, social practices and institutions, policies, and services. This presentation reports on outcomes from an investigation of research problems in data curation conducted as part of the Data Curation Education in Research Centers (DCERC) program. DCERC is developing a new model for educating data professionals to contribute to scientific research. The program is organized around foundational courses and field experiences in research and data centers for both master’s and doctoral students. The initiative is led by the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, in collaboration with the School of Information Sciences at the University of Tennessee, and library and data professionals at the National Center for Atmospheric Research (NCAR).At the doctoral level DCERC is educating future faculty and researchers in data curation and establishing a research agenda to advance the field. The doctoral seminar, Research Problems in Data Curation, was developed and taught in 2012 by the DCERC principal investigator and two doctoral fellows at the University of Illinois. It was designed to define the problem space of data curation, examine relevant concepts and theories related to both technical and social perspectives, and articulate research questions that are either unexplored or under theorized in the current literature. There was a particular emphasis on the Earth and environmental sciences, with guest speakers brought in from NCAR, National Snow and Ice Data Center (NSIDC), and Rensselaer Polytechnic Institute. Through the assignments, students constructed dozens of research questions informed by class readings, presentations, and discussions. A technical report is in progress on the resulting research agenda covering: data standards; infrastructure; research context; data reuse; sharing and access; preservation; and conceptual foundations. This presentation will discuss the agenda and its importance for the geosciences, highlighting high priority research questions. It will also introduce the related research to be undertaken by two DCERC doctoral students at NCAR during the 2013-2014 academic year and other data curation research in progress by the doctoral DCERC team.

Outcomes of the Data Curation for Geobiology at Yellowstone National Park Workshop (Presented by A. K. Thomer)
A. Thomer, C. L. Palmer, B. W. Fouke, A. Rodman, G. S. Choudhury, K. S. Baker, A. E. Asangba, K. Wickett, T. DiLauro, V. Varvel

The continuing proliferation of geological and biological data generated at scientifically significant sites (such as hot springs, coral reefs, volcanic fields and other unique, data-rich locales) has created a clear need for the curation and active management of these data. However, there has been little exploration of what these curation processes and policies would entail. To that end, the Site-Based Data Curation (SBDC) project is developing a framework of guidelines and processes for the curation of research data generated at scientifically significant sites. A workshop was held in April 2013 at Yellowstone National Park (YNP) to gather input from scientists and stakeholders. Workshop participants included nine researchers actively conducting geobiology research at YNP, and seven YNP representatives, including permitting staff and information professionals from the YNP research library and archive. Researchers came from a range of research areas -- geology, molecular and microbial biology, ecology, environmental engineering, and science education.Through group discussions, breakout sessions and hands-on activities, we sought to generate policy recommendations and curation guidelines for the collection, representation, sharing and quality control of geobiological datasets. We report on key themes that emerged from workshop discussions, including:- participants’ broad conceptions of the long-term usefulness, reusability and value of data.- the benefits of aggregating site-specific data in general, and geobiological data in particular.- the importance of capturing a dataset’s originating context, and the potential usefulness of photographs as a reliable and easy way of documenting context.- researchers’ and resource managers’ overlapping priorities with regards to "big picture" data collection and management in the long-term. Overall, we found that workshop participants were enthusiastic and optimistic about future collaboration and development of community approaches to data sharing. We hope to continue discussion of geobiology data curation challenges and potential strategies at AGU. Outcomes from the workshop are guiding next steps in the SBDC project, led by investigators at the Center for Informatics Research in Science and Scholarship and Institute for Genomic Biology at the University of Illinois, in collaboration with partners at Johns Hopkins University and YNP. Read more

March 13, 2013
Research Showcase 2013
On Friday March 29, from 12:00PM - 5:00PM, CIRSS faculty and students will join GSLIS colleagues to present and share their research in a series of posters, presentations, and demonstrations. University of Illinois Vice Chancellor for Research Peter Schiffer will open the 2013 Research Showcase. The Research Showcase is an annual event open to campus and the general public.

The full program and event location can be found here:

See below for a list of CIRSS presentations, posters, and demonstrations.


HathiTrust Research Center: New Frontiers in Digital Scholarship
J. Stephen Downie, Craig Willis & Kahyun Choi

Site-Based Data Curation at Yellowstone National Park
Carole Palmer, Bruce Fouke, Ann Rodman, Sayeed Choudhury, Andrea Thomer, Karen Baker, Abby Asangba & Karen Wickett

GSLIS at the Text REtrieval ConferenceRetrieving and Filtering Real-Time Data
Miles Efron


Identifying Claims in Social Science Literature
Shameem Ahmed, Catherine Blake, Kate Williams, Noah Lenstra & Qiyuan Liu

Describing the Quality of Research Datasets Across Disciplines: A Comparative Study
Tiffany C. Chao

Sustainable Software
Craig Evans & Jerome McDonough

On the Effect of Name Ambiguity on Measures of Large-Scale Co-Authorship Networks
Brent D. Fegley & Vetle I. Torvik

Enhancing Cultural Heritage Collections by Supporting and Analyzing Participation in Flickr
Jacob Jett, Megan Senseney & Carole L. Palmer

Location-Based Navigation: Combining OPAC Searching and 3D Visualization in a High-Density Storage Facility
Fredrick Kiwuwa Lugya & Michael B. Twidale

Site-Based Data Curation at Yellowstone National Park
Carole L. Palmer, Bruce Fouke, Ann Rodman, Sayeed Choudhury, Andrea Thomer, Karen Baker, Abby Asangba & Karen Wickett

When You Wish upon a Blog: How Collaborative Information Seeking can Interleave with CSCW
Aiko Takazawa & Michael B. Twidale

Completeness, Coverage, & Equivalence in Scientific Data Records
Andrea Thomer

Extending the Systematic Assertion Model for Humanities Research
Karen Wickett, David Dubin, Bridget Almas & Megan Senseney

Center for Informatics Research in Science and Scholarship
Carole L. Palmer, Director

HathiTrust Research Center
J. Stephen Downie, Director


The Illinois Distributed Museum Project: Engineering and Technology Innovations at the University of Illinois at Urbana Champaign
Michael B. Twidale, Susan Frankenberg, Tom Ackerman & Kelsey Heffren

Exploiting Structural Data for Music Exploration
Craig Willis, J. Stephen Downie, Kahyun Choi & David Bainbridge Read more

January 15, 2014
CIRSS participation at IDCC14
CIRSS faculty and students will participate in the upcoming 9th International Digital Curation Conference (IDCC), February 24-27 in San Francisco. This year's theme, "Commodity, catalyst or change-agent? Data-driven transformations in research, education, business & society," will focus on how data-driven tools and services allow us to explore, manage, use, and benefit from the world around us.
Preparing the workforce for digital curation: The iSchools perspective
Organized by Carole L. Palmer
This panel will discuss the work of the study committee on Future Career Opportunities and Educational Requirements for Digital Curation, in relation to educational programs in iSchools and expected workforce trends.
Scientific research group data management practices and local data repositories
Baker, K. S.
Research groups in the earth and environmental sciences are beginning to address new expectations for data management and data sharing aimed at providing data access. Further, a variety of sizes and configurations of data repositories have emerged in the last decades. Preparing data for release to a digital repository requires development of new data management practices particularly when scientific inquiry involves the heterogeneity of data associated with fieldwork in the natural sciences (Borgman, 2012; Parsons et al, 2011). This study focuses on repositories associated with project-oriented communities of data generators typically tied to a geographically specific site. What are the characteristics of research-oriented data repositories? What is their impact on scientific research groups (SRGs)? What decisions do scientists make about their data and data practices? Existing SRGs that have close ties with a local data repository provide an opportunity to investigate an array of data and repository arrangements.
Exploring description for research data in soil science journal publications
Chao, T.
Curating data collections in the classroom: lessons learned
Duerr, R. & Chao, T.
Data curation issues in transitioning a field science collection of long-term research data and artefacts from a local repository to an institutional repository
Kaplan, N. E., Draper, D. C., Paschal, D. B., Moore, J. C., Baker, K. S., & Swauger, S.
A long-term place-based research effort in ecology, such as the Shortgrass Steppe Long-Term Ecological Research (SGS-LTER) project, produces a plethora of research data, articles, and other artefacts. These materials represent an extensive knowledge base created by a collaborative community over time. After thirty years of continuous interdisciplinary support for research (Lauenroth and Burke, 2008) and data management (Stafford et al, 2002) for the shortgrass steppe of eastern Colorado, the SGS-LTER site is being decommissioned and will no longer be funded as an LTER site after 2014.  Currently, one focus of SGS-LTER information management is to complete submission of all datasets in the SGS data repository to the LTER Network Information System (NIS, Baker et. al. 2000, Michener et. al 2010) prior to the end of funding.   A second high priority activity is partnering with the Colorado State University Institutional Repository (CSU IR) to ensure that collections of artefacts, digital data and other objects remain open and available to local researchers who will continue their research on the shortgrass steppe by other means and may seek to append, revise and use their data.  The SGS-LTER presents an example of a project with a rich legacy of data and information, in a variety of forms and file types, which if preserved in a local repository will continue to support local research efforts as well as contribute to advancing our understanding of ecology through data use. 
Data Curation Education in Research Centers: Formative evaluation results from 2012-2013 cohorts
Palmer, C.L., Thompson, C.A., Mayernik, M. S., Williams, V., & Allard, S.
A citation analysis of “Data Publications” in Earth systems science
Weber, N. & Mayernik, M.
Automating the classification of author contribution statements
Weber, N. & Thomer, A.
Read more

January 28, 2014
Carole Palmer to present webinar on Data Curation Basics for the National Institutes of Health Library
Carole Palmer, Director of CIRSS and Professor at GSLIS, will present a webinar on "Data Curation Basics" as part of the Data Literacy for Librarians webinar series hosted by the National Institutes of Health Library. The webinar will take place on Tuesday, January 28, 2014 from 10:15am – 11:30pm CST.

Dr. Palmer's talk will provide an overview of methodology and core principles related to curating research and scientific data, including data curation profiles from several research domains. Read more

February 12, 2014
Carole Palmer presents seminar on site-based curation for the CyberGIS Brown Bag Seminar series
Carole Palmer presents seminar on site-based curation for the CyberGIS Brown Bag Seminar series
Carole Palmer, Director of CIRSS and Professor at GSLIS, presented a talk titled “Optimizing Data Resources for Reuse:  Site-Based Data Curation” as part of the CyberGIS Brown Bag Seminar series.  The seminar was held on Tuesday, Februrary 11 from 12:00pm to 1:00pm at the NCSA Building, on the University of Illinois campus. 
Advances in the management and archiving of digital data are proceeding apace, but how valuable are the data accumulating in our repositories? In this presentation, I report on the Site Based Data Curation (SBDC) project, a collaboration among information scientists, geobiologists, data archiving experts, and resource managers at Yellowstone National Park (YNP). Guided by indicators of value for research data, the project is developing a curation framework for data from scientifically significant sites. The curation guidelines and processes focus on description and organization of reusable datasets for scientific aims, while also satisfying YNP¹s needs to improve coordination of data collection and management of resources and research activity at YNP.
Read more

February 3, 2014
CIRSS researchers to present at iConference 2014 in Berlin
CIRSS faculty and researchers will make a strong showing as they share their expertise in scientific data curation, information sharing, and data curation education at the ninth annual iConference, March 4-7 in Berlin, Germany. Hosted by the Berlin School of Library and Information Science at Humboldt-Universität zu Berlin, this year's event will assemble scholars and researchers from around the globe as they explore the topic "Breaking Down Walls: Culture, Context, Computing". Traditional paper and poster sessions will be supplemented by workshops and Sessions for Interaction and Engagement, offering attendees direct and theoretical approaches to critical information issues in contemporary society.
Meeting Data Workforce Needs: Indicators Based on Recent Data Curation Placements
Carole L. Palmer, Cheryl A. Thompson, Karen S. Baker, & Megan Senseney
The field of library and information science has been steadily advancing data curation education and practice in response to workforce demands. This paper reports on a formative evaluation of the Specialization in Data Curation program at the University of Illinois, aimed at understanding job preparedness and work experiences of graduates and areas for improvement in data curation education. Survey results are complemented by analysis of placement patterns of graduates to date. Employment levels and career satisfaction were found to be high, with internships, practicum, and assistantships identified as key factors in employability. Duties in current positions emphasize liaison and consulting, user instruction, data management, metadata, and policy development. Recommended areas for further emphasis in the curriculum included computer programming and domain knowledge. About half of all placements were outside of academic libraries, with the second largest group in the corporate sector. Results overall suggest that a general LIS education is far from adequate for current data curation positions. As an evaluation of the earliest formal LIS program in the U.S. focused on the curation of research data, the study provides important evidence of actual data curation responsibilities in the current workforce and perceived educational gaps that can guide future planning, design, and improvement of programs to meet the escalating demand for data curation in the information professions.
Computational Assessment of the Impact of Social Justice Documentaries
Jana Diesner, Susie Pak (St. John’s University), Jinseok Kim, Kiumars Soltani, & Amirhossein Aleyasen
Documentaries are meant to tell a story, that is, to create memory, imagination and sharing (Rose, 2012). Moreover, documentaries aim to lead to change in people’s knowledge and/ or behavior (Barrett & Leddy, 2008). How can we know if a documentary has achieved these goals? We report on a research project where we have been developing, applying and evaluating a theoretically-grounded, empirical and computational solution for assessing the impact of social justice documentaries in a scalable, robust and rigorous fashion. We leverage cutting-edge methods from socio-technical data analytics – namely natural language processing and network analysis - for this purpose and provide a publicly available technology (ConText) that supports these routines. In this paper, we focus on the theoretical foundations of this project, address our methodological and technical framework, and provide an illustrative example f the introduced solution.
Wiki As A Platform - Turning Dissemination into Collaboration
Craig S. Evans
In research projects, data collection and dissemination are considered as two discrete and independent activities. The focus is on the research question, and not on how to best collect, present and subsequently share data. Although most US funding agencies now require that researchers data share, the tools available to operationalize this requirement are lacking. We propose show how the open source MediaWiki system can provides a lightweight, collaborative, and inexpensive tool to support new data sharing practices. This note serves to illustrate how interactive data collection and dissemination supported by a Wiki server can be used by scientists both during the project and for subsequent dissemination.
How Databases Learn
Andrea K. Thomer & Michael B. Twidale
For at least the last 40 years, the relational database has been a fixture of the modern research laboratory -- used to catalog and organize specimens and petri dishes, as well as to organize and store research data and analyses; Manovich goes so far as to call them the “key form of cultural expression” in the computer age (1999). Yet, though there are numerous textbooks on database design and short-term maintenance, and a fair amount of LIS and CSCW literature exploring people’s use of, and on-going collaboration around, databases, there is still a need for deeper exploration of how these artifacts change, grow and are maintained in the long term, and how their very structure can affect their users’ work. Findings from this deeper, more extended exploration would have implications for not just data curation, preservation and management, but also for our understanding of actual, situated information organization practices and needs in science: designing for actual practice rather than for unrealistic idealization of these practices and needs. We draw inspiration, and our title, from Brand’s highly influential book; “How Buildings Learn” (1995). We believe many of the topics Brand discusses regarding buildings’ change and growth over time might usefully be applied to certain aspects of databases.
Extending Curation profiles to study enterprise-level data practices
Nicholas M. Weber & Carole L. Palmer

This poster presents preliminary work in adapting a ‘curation profiles’ approach to study data practices in a corporate enterprise setting. We outline important similarities and differences between the curation of basic vs. applied research data and present preliminary findings from a pilot study with design engineers at a multi-national corporation that manufactures heavy machinery. We show that reproducibility, quality vs. value, and that discovery-driven quality control are key areas for the development of new curation services in this sector. We conclude with some future directions for extending the curation profiles project to new data-intensive workplace settings.
Identifying Descriptive Indicators for Research Data from Scientific Journal Publications
Tiffany Chao

In order to support the sharing and reuse of scientific research data, rich description about the data must be made available. Scientific journal publications are a potential resource in contributing contextual details about the collection, generation, use, and analysis of data critical for facilitating meaningful interpretation. This poster presents an exploratory study on what information related to data can be identified from published literature on soil science research. The preliminary findings reveal the range of information detailed about data within journal publication including discussion of data sources, referenced techniques and processes applied to data, and description on how data variables were collected and derived. With the growth of digital data, these findings will contribute to the development of a systematic approach for enhancing description in data curation systems and services and fostering data reuse.
Using collections and worksets in large-scale corpora: Preliminary findings from the Workset Creation for Scholarly Analysis project
Harriett Green, Katrina Fenlon, Megan Senseney, Sayan Bhattacharyya, Craig Willis, Peter Organisciak, J. Stephen Downie, Timothy W. Cole, & Beth Plale (Indiana University)

Scholars from numerous disciplines rely on collections of texts to support research activities. On this diverse and interdisciplinary frontier of digital scholarship, libraries and information institutions must 1) prepare to support research using large collections of digitized texts, and 2) understand the different methods of analysis being applied to the collections of digitized text across disciplines. The HathiTrust Research Center’s Workset Creation for Scholarly Analysis (WCSA) project conducted a series of focus groups and interviews to analyze and understand the scholarly practices of researchers that use large-scale, digital text corpora. This poster presents preliminary findings from that study, which offers early insights into user requirements for scholarly research with textual corpora.
Using Named Entity Recognition as a Classification Heuristic
Andrea K. Thomer & Nicholas M. Weber

This poster proposes the use of Named Entity Recognition as a heuristic tool for improving manual document classification. This technique was developed as part of a project studying collaborative work via the acknowledgment statements found in a corpus of formally published journal articles. We demonstrate how uncertainty in our initial text mining results were ‘ground-truthed’ using Natural Language Processing tools in a quick-and-dirty fashion. To verify this technique’s validity, we offer some initial results from our larger study.
Digital Collection Contexts: Intellectual and Organizational Functions at Scale
Organizers: Carole L. Palmer, Karen Wickett, Antoine Isaac (Europeana)
This workshop will bring together experts from European and North American iSchools and projects developing large-scale digital cultural heritage collections. It will provide a forum for examining conceptual and practical aspects of “collections” and the context they provide in the digital environment, in relation to the information needs of scholars, roles of cultural institutions, and international interoperability. One-page position papers submitted by each panelist will be distributed to participants in advance of the workshop along with a whitepaper entitled “Modeling Cultural Collections for Digital Aggregation and Exchange Environments” under development by a team of researchers at Europeana and the Center for Informatics Research in Science and Scholarship. The workshop will be divided into two sessions, each including a panel of experts and a set of breakout discussions.
Read more

June 17, 2014
CIRSS faculty, staff travel to Oxford for digital humanities workshop

CIRSS faculty and staff have collaborated with the Oxford e-Research Centre to organize a five-day workshop to be held this July. The workshop, Data Curation and Access for the Digital Humanities, is part of Digital Humanities at Oxford Summer School 2014 and will introduce data curation concepts and practices to a group of students, faculty, and information professionals who work with humanities research data. Registration is open through June 23.

Interim Dean Allen Renear, Associate Dean for Research J. Stephen Downie, and Professor Carole Palmer will all travel to Oxford to present talks based on data curation research and education efforts in the Center for Informatics Research in Science and Scholarship (CIRSS), directed by Palmer, and activities of the HathiTrust Research Center, codirected by Downie.

Also presenting is Megan Senseney, senior project coordinator with CIRSS, who is co-organizing the event, and Nic Weber (MS ’10), GSLIS doctoral student.

“The workshop and internship program represent the first outcomes of an ongoing collaboration between GSLIS and Oxford. We are excited about this opportunity for greater international engagement across multiple areas of research and education,” said Downie.

Talks by GSLIS participants will include “Levels of Data Representation and Encoding,” “Unlocking the Potential of 3 Billion Books/Workflows and Research Objects,” and “Normalizing Metadata Using Open Refine.”

Other presentations will be made by staff from Oxford's Bodleian Library, Oxford e-Research Centre, and Oxford Internet Institute.

Renear and Palmer will also participate in the panel, “The Future of Data Access and Preservation,” chaired by David De Roure (Oxford e-Research Centre, University of Oxford). Other panel presenters include William Kilbride (Digital Preservation Coalition), Christine Madsen (Bodleian Libraries, Oxford), and Kenji Takeda (Microsoft).

GSLIS is also collaborating with the Bodleian Library and the Oxford e-Research Centre on a pilot internship program to provide students with first-hand experiences with data curation practices and problems in library and research center settings at Oxford. Two GSLIS master’s students, Elizabeth Wickes and Jamie Wittenberg, have been selected for the six-week internship. Wickes will work on a research data management workflow project with a subject librarian at the Oxford Forestry Institute. Together they will archive a set of data and publications and deposit them into the appropriate systems. Wickes will also work with the discovery team to facilitate discovery and access. Wittenberg will work on a project focused on research objects, which relates to the sharing, citation, and curation of bundles of digital artifacts to support reconstruction and reproducibility of research.

Originally posted to

Read more