CIRSS Speaker Series, Fall 2024: Studying Science Scientifically

The CIRSS speaker series continues in Fall 2024 with a new theme of “Studying Science Scientifically: State of the Art and Prospects for the Science of Science.” With the availability of increasingly rich data sources, exciting new technologies for understanding natural language, and modeling methodologies taken from diverse domains of scholarship, the opportunities to observe, measure, and model the structure and dynamics of the scientific enterprise abound as never before. Presentations in this series will illustrate the breadth of advances that have been made, and are yet to be made, by researchers in the information, computing, and social sciences among others in this blossoming field.

We meet most Fridays, 11am-noon US Central Time, on Zoom. Our Fall series will be led by Yuanxi Fu and Timothy McPhillips This event is open to the public, and everyone is welcome to attend. The series is hosted by the Center for Informatics Research in Science and Scholarship (CIRSS) of the School of Information Sciences at the University of Illinois at Urbana-Champaign. If you have any questions, please contact Janet Eke.

Participate: To join a live session, follow the “Join Here” link for the current week below to access the iSchool event page for the talk. There click the “PARTICIPATE online” button to join the live Zoom session. Recordings of past talks can be found via the “Recording” links below if available.

Follow: To receive weekly updates on upcoming talks, subscribe to our CIRSS Seminars mailing list at https://lists.ischool.illinois.edu/lists/info/cirss-seminars. Subscribe to add events to your calendar via Google Calendar or Outlook.

Fall 2024 Speakers

Santo Fortunato, Indiana University Bloomington
Friday September 13, 2024, 11am-noon CT
Title: Navigating the new science of science: impact, collaboration, excellence

Abstract: Science of science is the investigation of science as a system, via analysis and modeling of data on scientists and their interactions. I will present results from our group on three key pillars of the discipline: impact, collaboration, and excellence. On impact, we found that the distributions of citations of papers published in the same discipline and year rescale to a universal curve, by properly normalizing the raw number of cites. I will show that active authors in a certain field may induce their collaborators to work in that field, especially if they are highly productive and cited. Also, I will discuss the impact of the COVID-19 pandemic on scientific collaboration. Finally, I will show that the lag between the year of the Nobel discovery and the year of the award has been growing exponentially over the years. What does this mean for science? No Nobels = No progress?

Bio: Santo Fortunato is a Professor at Luddy School of Informatics, Computing, and Engineering of Indiana University. Previously he was professor of complex systems at the Department of Computer Science of Aalto University, Finland. Prof. Fortunato got his PhD in Theoretical Particle Physics at the University of Bielefeld In Germany. His focus areas are network science, especially community detection in graphs, computational social science and science of science. His research has been published in leading journals, including Nature, Science, Nature Physics, PNAS, Physical Review Letters, Physical Review X, Reviews of Modern Physics, Physics Reports and has collected over 47,000 citations (Google Scholar). His single-author article Community detection in graphs (Physics Reports 486, 75-174, 2010) is one of the best known and most cited papers in network science. Fortunato received the Young Scientist Award for Socio- and Econophysics 2011, a prize given by the German Physical Society, for his outstanding contributions to the physics of social systems. He is Fellow of the Network Science Society (2022) and of the American Physical Society (2022). He is the Founding Chair of the International Conference of Computational Social Science (IC2S2), which he first organized in Helsinki in June 2015. He was Chair of Networks 2021, the largest ever event on network science, a historical merger of the NetSci and Sunbelt conferences. He is author of the book A First Course in Network Science, by Cambridge University Press (2020), the most accessible textbook on the new science of networks.

Thomas Stoeger, Northwestern University
Friday September 20, 2024, 11am-noon CT
Title: Science of Science as a Tool for Biomedical Discovery

Abstract: Biomedical research has traditionally concentrated on a small subset of genes that were extensively studied in the 1980s and 1990s. This focus has led to surprising gaps in our knowledge: typically, half of the genes that are important to disease, according to unbiased data, have never been mentioned in any research articles. Since this gap persists despite being noted two decades ago, my research seeks to understand why this lack of investigation continues. Building on these insights, I have developed hypotheses on how to effectively study a broader set of genes. To test these hypotheses, I took a significant career risk by personally applying them to a scientific field in which I had no prior experience. This effort led to the discovery of Gene Length-dependent Transcription Decline, a molecular phenomenon that explains changes in gene activity during human aging. Lastly, I will briefly present an unpublished AI-enabled investigation of the historical archives of the Human Genome Project and the National Human Genome Research Institute.

Bio: Thomas Stoeger is an Assistant Professor in the Division of Pulmonary and Critical Care at Northwestern University, where he established his laboratory in October 2023. He previously joined Northwestern as a data science scholar for his postdoctoral research. He graduated from the University of Zurich in 2016, where he received the annual award for the best PhD thesis in the sciences. His postdoctoral research earned him the K99/R00 Postdoc-to-Tenure-Track Award from the National Institute on Aging.

Luibov Tupikina, ITMO, Bell Labs, Paris Descartes LPI
Friday October 4, 2024, 11am-noon CT
Title: Dissecting knowledge space: learning higher-order structures from data

Abstract: Active area of research in AI is the theory of manifold learning and finding lower- dimensional manifold representation on how we can learn geometry from data for providing better quality curated datasets. There are however various issues with these methods related to finding low-dimensional data representation of the data, the so-called curse of dimensionality. Geometric deep learning methods for data learning often include a set of assumptions on the geometry of the feature space. Some of these assumptions include pre-selected metrics on the feature space, usage of the underlying graph structure, which encodes the data points proximity. However, the later assumption of using a graph as the underlying discrete structure, encodes only the binary pair- wise relations between data points, restricting ourselves from capturing more complex higher-order relationships, which are often present in various systems. These assumptions on the data together with data being discrete and finite may cause some generalisation, which may create wrong interpretations of the data and models, which produce the embeddings of data itself (such as BERT and others).

Bio: Liubov Tupikina (ITMO, Bell Labs, Paris Descartes LPI) is a researcher in computer science, mathematics and physics of complex systems. She has a PhD in theoretical physics, working on representation of dynamical systems using graph theory, and has worked on stochastic processes on graphs and hypergraphs. She now works on embeddings theory, low-dimensional data representations, higher-order mathematical structures representations of data encoded systems and hypergraphs encoding using algebraic theory (broadly explainable AI area). More information is on her website.

Vincent Larivière, Université de Montréal
Friday October 11, 2024, 11am-noon CT
Title: Are self-citations a normal feature of knowledge accumulation?

Abstract: Science is a cumulative activity, which can manifest itself through the act of citing. Citations are also central to research evaluation, thus creating incentives for researchers to cite their own work. Using a dataset containing more than 63 million articles and 51 million disambiguated authors, this talk will examine the relative importance of self-citations and self-references in the scholarly communication landscape, their relationship with the age and gender of authors, as well as their effects on various research evaluation indicators. Results show that self-citations and self-references evolve in different directions throughout researchers’ careers, and that men and older researchers are more likely to self-cite. Although self-citations have, on average, a small to moderate effect on author’s citation rates, they highly inflate citations for a subset of researchers. Comparison of the abstracts of cited and citing papers to assess the relatedness of different types of citations shows that self-citations are more similar to each other than other types of citations, and therefore more relevant. However, researchers that self-reference more tend to include less relevant citations. The talk will conclude with a discussion of the role of self-citations in scholarly communication.

Bio: Vincent Larivière holds the UNESCO Chair on Open Science at the Université de Montréal, where he is professor of information science and associate vice-president (planning and communications). He is also scientific director of the Érudit journal platform, associate scientific director of the Observatoire des sciences et des technologies (OST), and regular member of the Centre interuniversitaire de recherche sur la science et la technologie (CIRST). He holds a B.A. in Science, Technology and Society (UQAM), an M.A. in history of science (UQAM) and a Ph.D. in information science (McGill), and has performed postdoctoral work at Indiana University’s Department of Information and Library Science.

Meicen Sun, University of Illinois at Urbana-Champaign
Friday October 18, 2024, 11am-noon CT
Title: Damocles’ Switchboard: Information Externalities and the Autocratic Logic of Internet Control

Abstract: This paper advances a theory for the autocratic logic of internet control. Politically motivated internet control generates a positive externality for domestic data-intensive firms and a negative externality for domestic knowledge-intensive research entities. Exploiting a major internet control shock in 2014, I find that Chinese data-intensive firms gained 26 percent in revenue over other Chinese firms as the result of internet control. The same shock incurred a 10-percent decline in research quality from Chinese researchers, conditional on the knowledge-intensity of their discipline. It also reduced the research quality from Chinese researchers relative to their US counterparts by 22 percent in all disciplines. Due to the positive data externality, internet control enacted to prevent domestic threats challenges the state’s competing need for data sovereignty against foreign threats. Meanwhile, the state shields certain foreign knowledge-intensive actors from the negative knowledge externality to avoid the immediate economic costs they might otherwise impose. Qualitative evidence supports both implications, highlighting the centrality of short-term interests and foreign actors in autocratic decision-making.

This paper is published in International Organization here: https://doi.org/10.1017/S0020818324000237.

The presentation will conclude with a discussion on the role of knowledge diffusion in internet control’s impact on research. 

Bio: Meicen Sun is an assistant professor in the School of Information Sciences at the University of Illinois Urbana-Champaign. Her research examines the political economy of information, the geopolitics of data, and information policy. Her writings have appeared in academic and policy outlets, including International Organization, Foreign Policy Analysis, Harvard Business Review, World Economic Forum, and the Asian Development Bank Institute. She had previously conducted research at the Center for Strategic and International Studies and at the UN Regional Centre for Peace and Disarmament in Africa. Bilingual in English and Chinese, she has also written stories, plays, and music and staged many of her works — in both languages — in China, Singapore and the U.S. Sun served as a Fellow on the World Economic Forum’s Global Future Council on China and is an affiliated faculty with MIT FutureTech. She holds an AB with honors from Princeton University, an AM with a Certificate in Law from the University of Pennsylvania, and a PhD from the Massachusetts Institute of Technology followed by a postdoctoral fellowship at Stanford University.

Maksim Kitsak, Delft University of Technology
Friday November 8, 2024, 11am-noon CT
Title: Modeling and Inference of Complementarity Mechanisms in Networks

Abstract: In many networks, including networks of protein-protein interactions, interdisciplinary collaboration networks, and semantic networks, connections are established between nodes with complementary rather than similar properties. What is complementarity? The Oxford Dictionary asserts that “two people or things that are complementary are different but together form a useful or attractive combination of skills, qualities or physical features.” Sadly, our understanding of complementarity in networks does not go far beyond definition. While complementarity is abundant in networks, we lack mathematical intuition and quantitative methods to study complementarity mechanisms in these systems. Instead, we routinely retreat to using available off-the-shelf methods developed in the first place for similarity-driven networks.

In my talk, I will discuss my group’s recent achievements in the analysis of complementarity mechanisms in networks. I will first explain why existing similarity-based inference and learning methods are not readily applicable to systems where complementarity between interacting nodes plays a significant role. I will then deduce, starting with the definition by the Oxford Dictionary, a general complementarity framework for networks capable of describing any matching relations and containing both similarity and antitheses relations as special cases. Using the general framework, I will formulate a minimal null model to learn complementarity embeddings of real networks via maximum-likelihood estimation. I will demonstrate how complementarity embeddings can be used to infer both complementary and similar nodes in a network, enabling network inference tasks, such as link prediction and community detection. Armed with the new intuition and methods, I will examine collaboration patterns in co-authorship networks of different disciplines, demonstrating that both similarity and complementarity principles are at play there, albeit in varying proportions. I will conclude my talk with an outlook on the interplay of similarity and complementarity in the formation of networks, arguing for for a careful re-evaluation of existing similarity-inspired methods.

Bio: Maksim Kitsak is an Associate Professor of the Electrical Engineering, Mathematics, and Computer Science faculty of the Delft University of Technology, the Netherlands. Prof. Kitsak has been working at the intersection of Network Theory, Machine Learning, and Statistical Physics. Prof. Kitsak is particularly interested in the fundamental principles behind non-Euclidean network embeddings and novel applications of network embeddings in communication and biological networks. His research is often published in prestigious journals, such as Nature and Science Families. Prof. Kitsak gratefully acknowledges the financial support of the National Science Foundation (NSF, USA), Army Research Office (ARO, USA), and the Dutch Research Council (NWO, NL).

Alexander Furnas, Northwestern University
Friday November 15, 2024, 11am-noon CT
Title: Partisan Disparities in the Use, Production, and Funding of Science in the United States

Abstract: Science, long considered a cornerstone in shaping policy decisions, is increasingly vital in addressing contemporary societal challenges. However, it remains unclear whether science is used differently by policymakers with different partisan commitments, whether scientists with different partisan commitments produce substantively different science, or whether partisan policymakers in the federal government fund science at different levels. Here we combine large-scale datasets capturing science, policy, and their interactions, to systematically examine the partisan differences in the use, production, and funding of science in the United States. We find that the use of science in policy documents has featured a roughly 75 percent increase over the last 25 years, highlighting science’s growing relevance in policymaking. However, the pronounced increase masks stark and systematic partisan differences in the amount, content, and character of science used in policy. Democratic-controlled congressional committees and left-leaning think tanks cite substantially more science, and more impactful science, compared to their Republican and right-leaning counterparts. Moreover, the two factions cite substantively different science, with partisans citing the same papers less than half as often as expected from a null model. Indeed, partisans cite notably different science from each other even when they are addressing the same policy area. We find that the uncovered large partisan disparities are rather universal across time, scientific fields, policy institutions, and issue areas, and are not simply driven by differing policy agendas. Probing potential mechanisms, we field an original survey of over 3,000 political elites and policymakers, finding substantial partisan differences in trust toward scientists and scientific institutions, potentially contributing to the observed disparities in science use. Some of the differences we observe in partisan policymakers use of science may be driven by supply-side factors, as we observe substantive differences in the topics that scientists with different partisan affiliations themselves produce. This result is robust across all fields of science. Finally, using novel federal account-level appropriations data we demonstrate that between 1980-2020, Republicans have funded science and research related accounts at a higher level than their democratic counterparts. These results are not restricted to the Department of Defense, but rather are spread across multiple agencies, with Republican control of the House of Representatives being particularly influential in higher levels of science funding.

Bio: I’m a Research Assistant Professor at the Center for Science of Science and Innovation in the Kellogg School of Management, Northwestern University, a Faculty Associate at Institute for Policy Research, and the Ryan Center on Complexity. I received a Ph.D in Political Science at the University of Michigan in 2020 in American Politics (major subfield) and Quantitative Methods (minor Subfield). I specialize in the role of information and expertise in the policymaking. My dissertation examined the conditions under which Congress uses privately provisioned information produced by outside organizations in the policymaking process. More generally, I study interest groups, Congress, the intersection of science and politics, policy making and elite political behavior using survey, text analysis and network methods. I also have ongoing research projects on congressional staff capacity, interest group ideal point estimation, lobbying firms, and text reuse detection. 

Chaoqun Ni, University of Wisconsin-Madison
Friday November 22, 2024, 11am-noon CT
Title: Tenure and Research Trajectories of U.S. Professors

Abstract: Tenure is a foundational element of the U.S. academic system, yet its influence on faculty research trajectories remains largely unexplored. Theoretically, tenure systems may function as a selection mechanism that favors high-output researchers, as a dynamic incentive that drives high productivity before tenure but lowers it afterward, and as a creative search process that encourages tenured faculty to engage in high-risk research. In this study, we integrate data from seven sources to analyze the research outputs of over 12,000 tenure-line faculty across 15 disciplines, providing a comprehensive view of research trajectories on an unprecedented scale. Our findings indicate that publication rates typically increase significantly throughout the tenure track, peaking just before tenure. Post-tenure trends, however, vary by field. Examining creative search behaviors, we find that post-tenure faculty increasingly pursue novel, high-risk research, although this shift is accompanied by a decrease in impact, with fewer high-citation papers produced. Comparisons across career stages and between tenure-based and non-tenure-based settings further highlight that research trajectory shifts are closely aligned with tenure timing. These results offer a new empirical perspective on the tenure system, faculty research pathways, and the patterns of scientific output over academic careers.

Bio: Dr. Chaoqun Ni is an Assistant Professor at the University of Wisconsin-Madison’s Information School, where she co-directs the Metascience Research Lab. She also holds appointments at the Data Science Institute, Holtz Center for Science & Technology Studies, and Center for Demography of Health and Aging. She studies science and scholarship, with the goal of exploring the effects of practice and policy-related factors on the cultivation of a competitive scientific workforce. Her research has been published in high-impact journals such as Nature, Science Advances, and eLife. She holds a Ph.D. in Information Science from Indiana University Bloomington.

Sarah Bratt, University of Arizona
Friday December 6, 2024, 11am-noon CT
Title: Uncovering the Invisible Engines of Data-Intensive Science: Science of Science Studies of Datasets, Software, and Full-Text

Abstract: Science studies scholars have long studied the outputs of science — publications and patents. But, as Sabina Leonelli as pointed out, science is comprised of multifarious intermediate products and processes of science: datasets and software are the tissue that hold together a functioning scholarly universe. Like dark matter, the intermediary products of science have strong influence on the scholarly ecosystem, even though they may be difficult to detect and measure. Their presence must be inferred, often from the contexts in which they are embedded and the “spaces in between.”

In this talk, we will discuss ongoing projects on quantifying the intermediary products of science. The first project develops a text-based measure of humility in scientific inquiry, using NLP and computational grounded theory techniques to identify humble inquiry in the full-text of scientific articles. Second, we turn to the use and circulation of genomic datasets in the global north and south, and finally, discuss the linguistic clustering patterns of communities using scientific software in phylogenetics. This talk demonstrates the use of novel science of science datasets (Open Alex, SciSciNet, GenBank) and methods in science of science (computational grounded theory), and I hope it will be of interest to sociologists of science and knowledge, information scholars, and computational social scientists among others!

Bio: Dr. Sarah Bratt, PhD, is an Assistant Professor at the University of Arizona College of Information Science (iSchool). She holds a B.S. in Philosophy from Ithaca College and M.S. in Library and Information Science with a Data Science certificate from Syracuse University. Her research lies at the intersection of scholarly communication, research data management, and science of science. The overarching goal of her research is to understand and design for long-term research data sustainability and actionable science policy. Her research has been published in Quantitative Science Studies (QSS), Journal of Informetrics, and Scientometrics. She was a research Fellow at the Laboratory of Innovation Science at Harvard (LISH) and a Fellow at the iSchool Inclusion Institute (i3) and received multiple awards including the Masters’ prize in Library & Information Science at Syracuse University and honorable mention as a 2022 Better Scientific Software (BSSw) Fellow.