All Events Related to: Bertram Ludäscher

E-Research Roundtable: ERRT Annual Planning Meeting
Please bring your ideas for sessions that you would like to see on the schedule for this year.  This includes new topics and ideas, as well as previously suggested sessions that have not yet made it on the schedule, and updates to previous sessions.Read more

CIRSS Seminar: Introduction to the Whole Tale Project
PI's Bertram Ludäscher, Matt Turk and Victoria Stodden will present an introduction to the Whole Tale project, a five-year project newly funded by the NSF CC*DNI DIBBS program.  The goal of the Whole Tale project is to enable researchers to examine, transform, and then seamlessly republish research data and code used in an article, in order to enable new discovery by allowing researchers to construct representations and syntheses of data.  Presenters will provide an overview of project activities getting underway now.

The full project abstract is provided below:

Scholarly publications today are still mostly disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible from scholarly publications. The third layer is broad, encompassing numerous research communities through science pathways (e.g., in astronomy, life and earth sciences, materials science, social science), and deep, using interconnected cyberinfrastructure pathways and shared technologies.

The goal of this project is to strengthen the second layer of research output, and to build a robust third layer that integrates all parts of the story, conveying the holistic experience of reproducible scientific inquiry by (1) exposing existing cyberinfrastructure through popular frontends, e.g., digital notebooks (IPython, Jupyter), traditional scripting environments, and workflow systems; (2) developing the necessary 'software glue' for seamless access to different backend capabilities, including from DataNet federations and Data Infrastructure Building Blocks (DIBBs) projects; and (3) enhancing the complete data-to-publication lifecycle by empowering scientists to create computational narratives in their usual programming environments, enhanced with new capabilities from the underlying cyberinfrastructure (e.g., identity management, advanced data access and provenance APIs, and Digital Object Identifier-based data publications). The technologies and interfaces will be developed and stress-tested using a diverse set of data types, technical frameworks, and early adopters across a range of science domains.

* Biosketches

Bertram Ludäscher is Director of the iSchool's Center for Informatics Research in Science and Scholarship (CIRSS), and Professor at the iSchool, NCSA, and the Department of Computer Science. He conducts research in scientific data management, scientific workflows, and data provenance. His research interests also include foundations of databases, knowledge representation, and reasoning.  Ludäscher applies this work in a number of domains, including biodiversity informatics and taxonomy.

Matthew Turk is a research scientist at NCSA, interested in analysis and visualization of data, community-building around academic software, and the dynamics of open source ecosystems.

Victoria Stodden is an associate professor in the School of Information Sciences at the University of Illinois at Urbana-Champaign, with affiliate appointments in the School of Law, the Department of Computer Science, the Department of Statistics, the Coordinated Science Laboratory, and the National Center for SuperComputing Applications. She completed both her PhD in statistics and her law degree at Stanford University.  Her research centers on the multifaceted problem of enabling reproducibility in computational science. This includes studying adequacy and robustness in replicated results, designing and implementing validation systems, developing standards of openness for data and code sharing, and resolving legal and policy barriers to disseminating reproducible research.
Read more

E-Research Roundtable: ERRT planning session
Bring your ideas for sessions that you would like to see on the schedule for this year.  This includes new topics and ideas, as well as previously suggested sessions that have not yet made it on the schedule, and updates to previous sessions.Read more

E-Research Roundtable: Data Curation for Biodiversity Informatics
There are an estimated 2-3 billion specimens held in natural history museums throughout the world; these specimens comprise an irreplaceable dataset documenting this planet’s changing biodiversity over time, and will be crucial to understanding how ecosystems will further change in a warming world. However, this dataset’s use is hindered by its distributed nature: these 2-3 billion data points are stored in thousands of museums, catalog ledgers, databases, and computer systems, and only a tiny fraction of them have even been digitized, let alone published in a usable electronic form.  An interdisciplinary informatics approach is needed to mobilize this data – one rooted not just in biology, but also in a sociotechnical understanding of computing systems.
In this ERRT, we’ll first present three GSLIS-affiliated biodiversity informatics projects that aim to take such an approach:
We’ll then open the session to the broader group for discussion and exploration of links to other LIS-adjacent fields (such as CSCW, data curation) and areas of interest (software and workflow reproducibility, provenance).
Read more

CIRSS Seminar: Euler: A Logic-Based Toolkit for Aligning and Reconciling Multiple Taxonomic Perspectives"
(Joint work with UC Davis students and collaborators Nico Franz, Curator of Insects at ASU, and Shawn Bowers, Dept. of Computer Science at Gonzaga University)

Biological classifications and phylogenetic inferences often need to be updated and revised in light of new information. Such taxonomic revisions are usually done by domain experts who relate and align concepts of previously published taxonomies. Usually two (or more) taxonomies T1,T2, and a set of expert articulations A between them (i.e., set constraints relating taxa from T1,T2) are given. The goal then is to find the "best merge" of T1 + T2 + A; ideally a single, unique result taxonomy T3.  We have developed Euler/X, an open source toolkit to solve such taxonomy alignment problems using a number of underlying reasoning engines (e.g., we use X in {FO, ASP, RCC}, denoting First-Order logic, Answer Set Programming, and Region Connection Calculus reasoners, respectively.) Euler can detect and diagnose inconsistent inputs and then suggest repairs, i.e., minimal changes to the user-provided input articulations that eliminate the logical inconsistency. On the other hand, if the user input is ambiguous, multiple possible merge results for T3 (i.e., multiple "possible worlds") are obtained and presented to the user. I will present ongoing work to improve the effectiveness and efficiency of both inconsistency repair and ambiguity reduction. The Euler toolkit also provides a nice illustration of the many faces and uses of provenance in practical data analysis tools. Time allowing, I will also give a brief overview of other related projects, e.g., to develop a data curation toolkit for biodiversity data.
Bertram Ludäscher is a professor at the Graduate School of Library and Information Science (GSLIS). Prior to joining the iSchool at Illinois he was a professor at the Department of Computer Science and the Genome Center at the University of California, Davis. His research interests span the whole data to knowledge life-cycle, from modeling and design of databases and workflows, to knowledge representation and reasoning. His current research focus includes both theoretical foundations of provenance and practical applications, in particular to support automated data quality control and workflow-supported data curation.  He is one of the founders of the open source Kepler scientific workflow system, and a member of the DataONE leadership team, focusing on data and workflow provenance. Until 2004 Ludäscher was a research scientist at the San Diego Supercomputer Center (SDSC) and an adjunct faculty at the CSE Department at UC San Diego. He received his M.S. (Dipl.-Inform.) in computer science from the Technical University of Karlsruhe (now: K.I.T.) and his PhD (Dr.rer.nat.) from the University of Freiburg, both in Germany.
Read more

E-Research Roundtable: NOTE DATE CHANGE.  Socio-Technical Data-Theoretic Advancements for Probabilistic Risk Assessment
We continue to witness large-scale system failures resulting in injuries, fatalities, adverse environmental consequences, and economic losses. Preventing these catastrophic accidents requires advancements in multidisciplinary risk analysis supported by collaborative research among academia, industry, and national labs. It demands the development of a common vocabulary within diverse engineering and social science domains in order to address risks emerging from the interface of social and technical systems. These collaborations will lead to the improvement in socio-technical risk theories and the integration of deterministic and probabilistic techniques. The talk presents some of the speaker’s on-going multidisciplinary research projects, focusing on the advancement of Probabilistic Risk Assessment (PRA) that provides input for risk-informed nuclear regulatory decision making. The theoretical contribution of these projects is the incorporation of two types of underlying phenomena into PRA: (1) physical failure mechanisms (e.g., to model fire risk; location-specific Loss of Coolant Accidents leading to Emergency Core Cooling System failure) and (2) social failure mechanisms (e.g., to model the effects of human and organizational factors on technical system failure; socio-technical risk-informed emergency preparedness, planning and response modeling for severe accidents). This incorporation helps identify and manage root causes of system failure and reduces unnecessary conservatism in nuclear power plant operation and design. The methodological contribution of these research projects relates to the integration of classical PRA techniques (i.e., Fault Trees and Event Trees) with simulation-based methods, leading to the development of an Integrated PRA (I-PRA) that enable the modeling of emergent risk behavior by depicting the dynamic interactions of risk contributing factors within their ranges of variability and uncertainty. These cutting-edge PRA models are quantified with state-of-the-art big data analytics, which expand the classical approach of data extraction and implementation for risk analysis. The Big Data-Theoretic advancements of PRA (1) utilize big data analytics to address wide-ranging, incomplete, and unstructured data for risk assessment of complex systems and (2) are founded on physical and social failure theories, avoiding the possibility to be misled by solely data-informed approaches. These theories support the completeness of contextual risk factors and the accuracy of their causal relationships. The talk will conclude by demonstrating the monetary value of PRA. Although the current projects focus on nuclear power applications, the newly developed theories and techniques can be implemented in other high-hazard industries such as oil and gas, aviation, and space.
Zahra Mohaghegh is currently an Assistant Professor in the Department of Nuclear, Plasma, and Radiological Engineering (NPRE) and an affiliate to the Department of Industrial and Enterprise Systems Engineering, Graduate School of Library and Information Science, and Illinois Informatics Institute at the University of Illinois at Urbana-Champaign. She is a recipient of the George Apostolakis early-career award in risk assessment and the Zonta International Award for her contribution to modeling large-scale complex systems. Dr. Mohaghegh is the director of the Socio-Technical Risk Analysis (SoTeRiA) research group ( at NPRE. She is the author of over thirty journal and conference papers on risk analysis and a member of the Society for Risk Analysis, American Nuclear Society, and American Society of Mechanical Engineers. Dr. Mohaghegh holds Master’s and Ph.D. in Reliability Engineering from the Mechanical Engineering Department at the University of Maryland.
Faculty profile:
Read more