Filling the Workforce Gap in Data Science and Data Analytics

The workshop was held at iConference 2013

Location: Fort Worth, Texas
Date: Feb. 12-15, 2013


Catherine Blake

Associate Professor, School of Information Sciences
and Associate Director, Center for Informatics Research in Science and Scholarship (CIRSS)
University of Illinois at Urbana Champaign
Socio-Technical Data Analytics specialization at the iSchool at Illinois

Jeffery Stanton

Professor and Associate Dean for Research and Doctoral Programs
School of Information Studies
Syracuse University
CAS in Data Science at School of Information Studies

Ray Larson

School of Information
UC Berkeley


Changes in how information is created, disseminated and re-used provide new opportunities to intervene within the information lifecycle to ensure that the information created today will be available for centuries to come. Jim Gray, the first advocate of data intensive science, urges us to "support the whole research cycle - from data capture and data curation to data analysis and data visualization." (Hey, Tansley, & Tolle, 2009). The "data deluge" (Hey & Trefethen, 2003) is now a fundamental characteristic of e-science and "big science," especially in disciplines such as in cancer (e.g National Center for Biotechnology Information), astronomy (e.g., the Sloan Sky Survey), and atmospheric science (e.g., coupled climate models). The transition towards data-intensive science has created a critical shortage of both a knowledgeable workforce and best practices to address the challenges of data management, curation, and analysis and information synthesis.

Programs such as "DataNet" have been created to integrate "library and archival sciences, cyber-infrastructure, computer and information sciences, and domain science expertise to provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline". The Institute of Museum and Library Services (IMLS) has also invested heavily in educational initiatives that train data curation leaders around the country. Now that data curation training programs are well underway, it is time to focus our attention on workforce gaps in the latter activities in the information lifecycle such as in the McKinsey Global Institute report published earlier this year, which stated that "A shortage of the analytical and managerial talent necessary to make the most of big data is a significant and pressing challenge …"(Manyika et al., 2011, p3). The report goes on to say that "The United States alone faces a shortage of 140,000 to 190,000 people with deep analytics skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings" (Manyika et al., 2011, pg 104).


The goal of this workshop is to provide a forum for iSchool faculty who are developing programs in data analytics, eScience, eResearch, big data, and cyberinfrastructure to discuss best practices with respect to preparing students to fill the workforce need for managers and analysts to analyze big data and make decisions based on their findings. Questions of interest include (but are not limited to):

  • To what extent can big data be incorporated into the iSchool classroom experience ?
  • What opportunities might exist to share data and teaching models between institutions ?
  • For some students analytics is a very different way of thinking about information. What teaching strategies have you employed to bring students up the learning curve without sacrificing intellectual rigor ?
  • To what extent can faculty infuse their analytics research into the classroom setting ?
  • iSchool programs in data science and data analytics tend to emphasize the context in which data is collected and situate analytics within the broader work-flows in science, business and the community. To what extent is this balance between social and technical aspects of big data maintained?
  • What makes iSchools uniquely prepared to address research and educational requirements of data science, data analytics, cyberinfrastructure and eResearch ?

Workshop Format

The workshop organizers provide three different iSchool flavors of data science and analytics at different stages of development. Specifically, Dr. Blake will discuss the MS and PhD. in Sociotechnical Data Analytics at Illinois, Dr. Stanton will discuss the CAS in Data Science at Syracuse and Dr. Saxenian will discuss the MS in Information and Data Science Program at UC Berkeley.

Each organizer will present their respective program and address one or more of the questions above and thus demonstrate that each iSchool brings a different perspective to the big data workforce need. With that said the iSchools’ emphasis on people, information and technology makes them well-prepared to address both the research and educational gaps in data science and data analytics.

This workshop will be of interest for students who are interested in learning more about how training needs in big data are being addressed in iSchools, and for faculty who are developing new programs. We will ask participants to submit a 2 page position statement about the research and educational issues and one or more of the questions outlined above before the workshop. After the initial organizer presentations, workshop participants will be broken into small groups to address the questions above and in particular, identify gaps between training needs and existing programs.

References Cited

  • Hey, T., Tansley, S., & Tolle, K. (Eds.). (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, Washington: Microsoft Research. [Source]
  • Hey, T., & Trefethen, A. (2003). The data deluge: An e-science perspective. In F. Berman, G.C. Fox & A.J.G. Hey (Eds.), Grid Computing - Making the Global Infrastructure a Reality (pp. 809-824). London, UK: Wiley and Sons. [Source]
  • Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., et al. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute. [Source]