All Events Related to: Catherine Blake


CIRSS Seminar: Research Trends in Digital Scholarship and Digital Humanities
2018-09-14
Scientists and humanists often bring profoundly different perspectives, even to the same research object. Consider how each of these communities describe the virtues and vices of a bridge. While an engineer will remain focused on the structural integrity and reliability of a bridge, a humanist will consider the evolution of social and cultural norms that are transformed after a bridge literally connects two communities. And while we typically applaud bridges that can resist force, bridges have also been deliberately designed to collapse under the weight of an invading army. The presentation will use two case studies from the digital humanities to illustrate the different perspectives of computational models provided by an engineer and humanist. Despite these different perspectives many of the big data trends that have been embraced in the sciences are now starting to emerge in digital humanities.

* Bio:
Catherine Blake is an associate professor in the School of Information Sciences at the University of Illinois at Urbana-Champaign with affiliate appointments in the Department of Computer Science and the Department of Medical Information Science. At the iSchool, she serves as associate director of the Center for Informatics Research in Science and Scholarship and is an active member of the Socio-technical Data Analytics group.

Her primary research goal is to accelerate scientific discovery by synthesizing evidence from text. Her techniques embrace both automated and human approaches that are required to resolve contradictions and redundancies that are inevitable in an information-intensive world.

Blake earned master’s and doctoral degrees in information and computer science at the University of California, Irvine and bachelor’s and master’s degrees in computer science at the University of Wollongong.Read more




E-Research Roundtable: Learning User-Defined, Domain-Specific Relations: A Situated Case Study and Evaluation in Plant Science
2015-11-04
Abstract
Although methods exist to identify well-defined relations, such as is_a or part_of, existing tools rarely support a user who wants to define new, domain-specific relations. We conducted a situated case study in plant science and introduce four new domain-specific relations that are of interest to domain scientists but have not been explored in information science. Results show that precision varies between relations and ranges from 0.73 to 0.91 for the manufacturer location category, 0.89 and 0.93 for the seed donor-bank relation, 0.29 and 0.67 for the seed origin location, and 0.32 and 0.77 for the field experiment location. The manufacturer location category recall varies from 0.91 to 0.94, the seed bank-donor location recall ranges between 0.93 and 1, the seed origin relation from 0.33 to 0.82 while the field experiment location from 0.67 to 0.83 depending on the classifier and using a combination of lexical and syntactic features in the background.


Bios
Ana Lucic is a doctoral student at the University of Illinois Graduate School of Library and Information Science. Her research goals involve extracting information from text that allow innovative ways of looking at, analyzing, and summarizing text.

Dr. Blake is an Associate Professor in the iSchool (GSLIS) at the University of Illinois at Urbana-Champaign with joint appointments in the Department of Computer Science and Medical Information Science. She serves as Associate Director of the Center for Informatics in Science and Scholarship (CIRSS) along with the Director Bertram Ludäscher and is an active member of the Socio-technical Data Analytics (SODA) group. Her primary research goal is to accelerate scientific discovery by synthesizing evidence from text. Her techniques embrace both automated and human approaches that are required to resolve contradictions and redundancies that are inevitable in the information intensive world in which we live.Read more




CIRSS Seminar: Summarization of Biomedical Texts by Utilizing Information Extracted from Comparative Sentences
2015-09-25
Comparison sentences represent a rhetorical structure that is commonly used to communicate the findings of an empirical study. Although the overall percentage of comparison sentences in an article is rather small (in the four biomedical collections examined it hovered around 5% of the overall number of sentences) comparisons are rich with information that relate how the properties of one entity relate to that of a compared entity. The richness of information conveyed through comparison sentences, however, is generally underutilized. This dissertation proposal aims to tease out the crucial facets of a comparison claim (Blake, 2010) which represents an important step towards summarizing the information conveyed through comparative sentences across texts. The proposal analyzes the complexity of comparison sentences and suggests methods for the identification of four crucial components of comparison sentences: agent, object, basis of comparison as well as the relation that binds the three entities.
 
Bios:
Ana Lucic is a doctoral student at the University of Illinois Graduate School of Library and Information Science. Her research goals involve extracting information from text that allow innovative ways of looking at, analyzing, and summarizing text.

Dr. Blake is an Associate Professor in the iSchool (GSLIS) at the University of Illinois at Urbana-Champaign  with joint appointments in the Department of Computer Science and Medical Information Science. She serves as Associate Director of the Center for Informatics in Science and Scholarship (CIRSS) along with the Director Bertram Ludäscher and is an active member of the Socio-technical Data Analytics (SODA) group. Her primary research goal is to accelerate scientific discovery by synthesizing evidence from text. Her techniques embrace both automated and human approaches that are required to resolve contradictions and redundancies that are inevitable in the information intensive world in which we live.Read more




E-Research Roundtable: ERRT planning session
2015-08-26
Bring your ideas for sessions that you would like to see on the schedule for this year.  This includes new topics and ideas, as well as previously suggested sessions that have not yet made it on the schedule, and updates to previous sessions.Read more




CIRSS Seminar: iConference Paper Preview: Blake on "Evidence-based Discovery" and Lucic on "A Method to Automatically Identify the Results from Journal Articles"
2015-03-20
CIRSS Associate Director Cathy Blake will present a preview of her iConference 2015 paper "Evidence-based Discovery."  CIRSS PhD student Ana Lucic will present a preview of her iConference 2015 paper, co-authored with Blake and Gabb, "A Method to Automatically Identify the Results from Journal Articles."  Paper abstracts follow below.
 
Blake, C. (2015, March). Evidence-based Discovery. Paper to be presented at the 2015 iConference, Newport Beach, CA.
Both data-driven and human-centric methods have been used to better understand the scientific process. We describe a new framework called evidence-based discovery, to reconcile the gulf between the data-driven and human-centered approaches. Our goal is to provide a vision statement for how these (and other) approaches can be unified in order to better understand the complex-decision making that occurs when creating new knowledge. Despite the inevitable challenges, the combination of data and human-centric methods are required to understand, characterize, and ultimately accelerate science.
 
Gabb, A. H., Lucic, A., & Blake, C.  (2015, March). A Method to Automatically Identify the Results from Journal Articles. Paper to be presented at the 2015 iConference, Newport Beach, CA.
The idea of automating systematic reviews has been motivated by both advances in technology that have increased the availability of full-text scientific articles and by sociological changes that have increased the adoption of evidence-based medicine. Although much work has focused on automating the information retrieval step of the systematic review process with a few exceptions the information extraction and analysis have been largely overlooked. In particular, there is a lack of systems that automatically identify the results of an empirical study. Our goal in this paper is to fill that gap. We frame the problem as a classification task and employ three different objective, domain-independent feature selection strategies and two different classifiers. Additionally, special attention is paid to the selection of the data set used in this experiment, the feature selection metrics as well as the classification algorithms, and parameters of the algorithms used for classification in order to show the situatedness of this experiment and its dependence on each of the three parameters.

Bios:
 
Cathy Blake is the Associate Director of CIRSS and associate professor of Library and Information Science. She conducts research on both human and automated processes surrounding information synthesis and discovery.  Her areas of research include socio-technical systems, text mining, evidence-based discovery, information synthesis, collaborative information behaviors, recognizing textual entailment, summarization, and meta-analysis.
 
Ana Lucic is a CIRSS PhD student at GSLIS.  Her research goals involve identifying and extracting semantic relations from textual data that in turn allow innovative ways of interacting with, analyzing, and understanding text.  The methods used include text mining, natural language processing, information retrieval and content analysis.
Read more




E-Research Roundtable: Discussion session -- Obtaining funding from foundations, industry and new sponsors
2014-04-30
As government sources of grant funding shrink, researchers may look to expand their pool of sponsors. GSLIS faculty who have obtained research funding from foundations, industry and other sponsors less familiar to GSLIS, will discuss their experiences and answer questions with the overall goal of helping participants become more familiar with these potential funding sources. Mark Nolan of the Office of Corporate Relations will be on hand to discuss university-industry partnerships, and to share information about the resources available from his office.Read more




E-Research Roundtable: The Current State of the Claim Framework
2014-04-09
First published in 2010, the Claim Framework for identifying claims in scientific articles has evolved from an initial set of manual annotations into an automated approach that uses syntactic and semantic features of the text to identify various types of claims. The most recent version of the Claim Framework captures a significant amount of information about how the authors of scientific articles communicate the findings of empirical studies. Our talk will address multifarious applications of the Framework that include the reduction of dimensionality in information retrieval tasks, identification of key findings of an article in order to improve document summarization, establishing parameters of study design for indexing purposes as well as identification of natural language features of different claim types.Read more




CIRSS Seminar: Identifying Comparative Claim Sentences in Full-Text Scientific Articles
2012-06-29
Comparisons play a critical role in scientific communication by allowing an author to situate their work in the context of earlier research problems, experimental approaches, and results. Our goal is to identify comparison claims automatically from full-text scientific articles. In this paper, we introduce a set of semantic and syntactic features that characterize a sentence and then demonstrate how those features can be used in three different classifiers: Naïve Bayes (NB), a Support Vector Machine (SVM) and a Bayesian network (BN). Experiments were conducted on 122 full-text toxicology articles containing 14,157 sentences, of which 1,735 (12.25%) were comparisons. Experiments show an F1 score of 0.71, 0.69, and 0.74 on the development set and 0.76, 0.65, and 0.74 on a validation set for the NB, SVM and BN, respectively.Read more




E-Research Roundtable: GSLIS Work Force Study Requirements: The Past, Present, and Future
2012-02-01
There was (perhaps) a time when the training that you received during your first degree was sufficient to sustain your entire career. The information age has drastically altered the time-frame for re-tooling, particularly in the field of information library science. The goal of this e-research roundtable is to identify GSLIS infrastructure requirements that will (a) provide a resource for GSLIS students to align their interests with positions and curricula and (b) enable longitudinal analyses of workforce needs and issues. To achieve this goal will require a collective GSLIS effort that includes faculty, staff, and students. Dr. Blake will lead this working session by providing a framework that would enable us to leverage text mining for the initial activities. Faculty, staff, and students who have conducted work-force analyses are particularly welcome.Read more




E-Research Roundtable: Mining Biomedial Multiple-Ontology Patterns
2011-11-30
Ontologies provide a powerful way to organize information and understand the world in which we live. Despite their usefulness, articulating a complete and consistent ontology is difficult and time-consuming. Moreover, knowledge evolves and the effort required to keep an ontology current and thus relevant is often underestimated. Our goal in this project is to infer new ontological concepts based on an existing ontology and full text documents. In this presentation we focus on the Unified Medical Language System (UMLS), that maps biomedical concepts to surface level features in text (words). For example, the kidney cancer concept in the UMLS includes ‘cancer of kidney’ and ‘kidney cancer’ phrases, which can be framed as a text transformation X of Y = YX. We also explore word patterns and transformations between parent and child relationships in the UMLS. For example kidney cancer and breast cancer are both children of cancer and take the form <body part> cancer. We report frequent patterns in the UMLS and preliminary results on text.Read more




E-Research Roundtable: Technology's Positive Impact on the Cultureal Heritage of Native American Tribes
2011-02-16
At this session, Biagio Arobba will introduce his background in semantic middleware and Native American communities, discuss his heritage (and answer any questions), and explain his interest in social media, the Web, and mobile devices and why he believes they will help Native American communities with culture, language, and heritage preservation.

Semantic middleware, originally developed for e-science, has the potential to be transformational for Native American communities. We all know (or assume) that Native American languages are disappearing. You might be surprised to learn that over half of the pre-colonial Native American languages in the United States are still spoken today; but, that number is changing dramatically. Many Native American people are concerned with the disappearance of their spoken languages, and there is a desire for ... something ... to help the people in Native American communities, and local government organization, increase fluency among their peers.

There are both problems and opportunities. For example, many places in the United States are resistant to multi-lingual education. Then, working with local government and tribal agencies can be a nightmare. On the other hand, tribes in the United States have far better access to digital media and the Internet than would a community in the Amazon rain forest. Additionally, Native American children in either tribal communities or communities with relatively high Native American populations are drawn to social media, gaming, and mobile devices. The majority of elderly and young parents want the digital age for today's generation.

Also, there are lots of research and methods for teaching major world languages, but many of these same techniques aren't quite right for smaller minority language communities. In recent years, a growing number of tools have been popping up across the Internet (possibly because more attention is being paid to minority languages, or simply because computers, best practices, and the Internet have reached the necessary critical mass to make this possible). Mr. Arobba in his work looks for any way to reduce the need for reinventing the database for every application, to reduce time-to-deployment, and to make user interfaces easier for everyday users.

Resources:

Arobba, B., R.E. McGrath, J. Futrelle, and A.B. Craig, "A Community-Based Social Media Approach for Preserving Endangered Languages and Culture" In: "The Changing Dynamics of Scientific Collaborations" workshop at 44th Hawaii International Conference on System Sciences, January 3, 2011.

Live and Tell
Read more




E-Research Roundtable: ERRT Planning Session
2010-09-15
Please bring your ideas about roundtable sessions that you would like included on the schedule. This includes completely new ideas, as well as previously suggested sessions that have not made it onto the schedule.Read more




E-Research Roundtable: The Claim Framework (Part 2)
2010-04-14
Resources: http://bibapp.org/

Description: [see Part 1 description]
Read more




E-Research Roundtable: The Claim Framework (Part 1)
2010-03-10
Massive increases in electronically available text have spurred a variety of natural language processing methods to automatically identify relationships from text; however, existing annotated collections comprise only bioinformatics (gene-protein) or clinical informatics (treatment-disease) relationships. This paper introduces the Claim Framework that reflects how authors across biomedical spectrum communicate findings in empirical studies. The Framework captures different levels of evidence by differentiating between explicit and implicit claims, and by capturing under-specified claims such as correlations, comparisons, and observations. The results from 29 full-text articles show that authors report fewer than 7.84% of scientific claims in an abstract, thus revealing the urgent need for text mining systems to consider the full-text of an article rather than just the abstract. The results also show that authors typically report explicit claims (77.12%) rather than an observations (9.23%), correlations (5.39%), comparisons (5.11%) or implicit claims (2.7%). Informed by the initial manual annotations, we introduce an automated approach that uses syntax and semantics to identify explicit claims automatically and measure the degree to which each feature contributes to the overall precision and recall. Results show that a combination of semantics and syntax is required to achieve the best system performance.

Resources: Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles
Read more