Methods

The Data Curation Curriculum Search was designed over a multi-step process. The first step was to look for similar projects that might already have course and program databases. There are similar projects such as the International Digital Curation Education and Action Working Group; however, its focus is currently on the curricular content itself and not on cataloging course offerings per se. In the end, a similar course scan upon which to build was not located.

A search strategy was then developed with which to choose courses and programs for inclusion in the database. For a course to be included, either its name or its description had to include a key word or key word combination commonly associated with data curation activities or associated area of data science and data management. Data Curation for the purpose of inclusion was broadly defined as the active and on-going management of data through its lifecycle of interest and usefuless to scholarship, science, and education. Data curation activities enable data discovery and retrieval, maintain data quality, add value, and provide for re-use over time. This field also includes authentication, data standards, archiving, collection and management, preservation, retrieval, knowledge representation, and policy as it affects data. The starting list of keywords was derived from the current literature on data curation and by consulting the Matrix of Digital Curation Knowledge and Competencies. It evolved as new and relevant terms were identified while reviewing courses for alignment with our focus on data expertise. The final list included: {Archiving (in a digital or data context) ; Authentication (in a data context) ; Conservation (in a digital or data context); Curation (in a digital or data context) ; Cyberinfrastructure ; Data access ; Data collection ; Data conservancy; Data discovery ; Data mining; Data provenance ; Data quality ; Data retrieval ; Digital library ; Data standards (in a non computer science context) ; Digitization ; e-Science or eScience ; Informatics ; Information architecture ; Information documenting ; Information modeling; Management (in a digital, data, information, or knowledge context) ; Metadata ; Ontology ; Policy (in a digital, data, or information context) ; Preservation (in a digital or data context) ; Representation (in a data, information, or knowledge context), Retrieval (in a digital, data, or information context) ; Semantic web ; Systems analysis ; & Web 2.0}. The final list of 44 keyword and keyword groupings included basic terms and more general and specific terms found to be important for surfacing relevant courses. Web 2.0 and Data conservancy did not return any unique items.

A three-phase process for entering courses and programs into the system was chosen to enable testing of the system over time, to ingest library schools with known data curation programs first, to allow progressive development and roll-out of the front-end delivery module, and to allow management of the project by our limited staff. In phase one, the course catalogs of iSchools were searched. Courses and programs were identified by manually searching the current (as of Fall-Spring 2011) online course catalogs of institutions. Information for both the courses and any associated programs were then input by cut and paste from these catalogs into the curriculum database using a secure data entry system. In phase two the search was extended to include additional schools known to have data curation programs and any schools that attended the Data Curation Research Summit or the Research Data Workforce Summit in December 2010 as part of the 6th International Digital Curation Conference. In the third phase the search was extended to include all known accredited library schools with an online searchable course catalog. When searching catalogs, the searches were limited to courses in library or information schools. Courses that were present only in computer science departments without being cross listed would not be ingested during this stage of the search process. Courses in programming were not intended for inclusion based upon our definition of data curation.

Course information entered into the database included: Course title, Course description, Total Credit, Credit type, Degree type, Delivery modes, Programs it is included within, Whether it is specifically listed as required or recommended for a program, The length of the course.

Program information entered into the database included: Program title, Program description, Program URL, Courses specifically listed in that program that are in the database (dynamically created upon course entry).

Each of the above are also linked to Institution Information which includes: Institution name, Institution address, Institution URL

Once all courses had been input, many of these courses, while meeting the general search criteria, upon closer inspection, were not clearly associated with data curation when viewing the entire course description in context. The initial search strategy was purposefully broad so as not to inadvertantly miss courses or their associated programs. The course descriptions for each course were now more closely scanned for courses that fit a data curation profile. Qualitative analysis was performed on the course descriptions to take into account the context in which keywords occurred, and websites were further examined to identify associated degree programs and specializations. Overview courses in general library sciences were not included in the final dataset. With our broad definition of data curation, some exceptions were made to error on inclusion when course descriptions did not contain enough information. After adding and removing courses and programs, the system was also set to only include graduate level courses, as some systems listed undergraduate level courses with data curation topics.

To further clean and validate the dataset, individuals at each institution were contacted. At least two email contacts were attempted at each institution to a dean/director and an appropriate faculty member with a profile on the institution website suggesting involvement or interest in data curation type activities. A course list was sent in each email asking for validation. Only 8 institutions responded as of August 24, 2011. That number had risen to 21 by September 5, 2011 and 29 by November 2011. Since the final dataset only included courses from 55 of the institutions and only 53 of them included available contact information, the resultant response rate was 54.7%. From this stage, courses were edited, added, or removed based on feedback. Additional courses were also added from new course catalog entries provided by the responses. The dataset as of November 25, 2011 contained 475 courses in 158 separate programs at 55 institutions. The unparsed database contained 715 total courses linked to 203 separate programs at 63 separate institutions.

Please note that while syllabi were not systematically collected, a representative set was assembled for consulation during the analysis.

On the technical side, information was input to the system using a secure Web interface developed within LogiXML. Data was entered into a MySQL database. The data was then made available for search through a PHP search utility or direct database query.

General analysis of the dataset for term occurrence was completed using SQL queries. Coding of course descriptions was conducted using ATLAS.ti with coded grounded in the data. Three mutually exclusive analyses were conducted.

In one analysis (content family), every distinctive phrase within the course descriptions were entered as a grounded code. No interpretation was used other than simplifying terms to a common grammatical state such as "digital libraries" and "digital library" both being coded as "digital library" or "record classification" and "classification of records" both coded as "record classification"; however, "record classification" and "information classification" would both be separate codes. In this round of coding, all separate terms remained separate. Over 900 different terms resulted from coding of 476 different course descriptions, demonstrating a clear lack of consensus in terminology usage among library schools. These terms where then grouped into thirteen families. In general, a term was restricted to one family; however, a few exceptions were made. Decision of which family to place a term was made in consultation with curriculum experts in the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. The courses in which a term occurred and co-occurrence with other terms were also considered.

In another analysis (course type), each course is also delineated as traditional, digital, data inclusive, or data centric. A data centric course was defined as a course that was entirely about data and data topics as depicted by the course description. A digital course was a course in which at least 30% of the content in the course description pertained to digital topics. Data inclusive courses were digital courses that mentioned data topics. Traditional courses were all remainined courses. Each course was coded separately by two coders for these determinations. There were 13 discrepancies that were then resolved until agreement on all courses. When comparing the content family code with the course type codes, there were a few that did not appear to make sense, such as a course on a data topic that would appear in the traditional course type, but these could be explained often by a methods course or some other parameter. We did not however remove these courses from the dataset.

The final analysis so far was coding of the program descrptions into traditional, digital, data inclusive, or data centric using the same rules as the course descriptions.

Statistical analysis was completed with SPSS v.19. Graphing of data was performed with Microsoft Excel.

It should be pointed out that the coding of the data was intended to bring out patterns within the course descriptions, but is not intended to be taken as the only interpretation of the courses and programs. They were still interpreted holistically to procur our conclusions.

Begin Your Search

 

Sample Email

Dear Dr. ...,

The Data Conservancy is an NSF-sponsored partnership aimed at transforming the ability of scientists to answer grand challenge questions. The larger Data Conservancy vision entails scientific data curation as a means to collect, organize, validate, and preserve data so that researchers can address research challenges facing society as a whole. As part of this multi-faceted project, we have been researching curriculum covering data curation and closely related fields.  We have researched courses and programs by manually searching the current (as of spring-summer 2011)) online course catalogs of institutions. As an iSchool with known data curation coverage in your curriculum, you have been included in our research.

At this time, we were hoping someone at your institution could take a look at the courses we have included to determine if this is an accurate representation of ‘data curation’ type course offerings at your institution.  A list of courses our research has pulled is given below as well as our method for inclusion. Your help is greatly appreciated. Could you please forward this email to the appropriate person if this email has been misdirected?  My initial email to the interim director was not returned, but your research interests on the website suggested you may be able to help.

Sincerely,

Virgil E. Varvel Jr.

Center for Informatics Research in Science and Scholarship
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
501 E. Daniel St., Champaign, IL 61820
Vvarvel@illinois.edu
(217) 333-1980

For a course to be included, either its name or its description had to include a key word commonly associated with data curation activities or literature. This keyword list included {Archiving (in a digital or data context) ; Authentication (in a data context) ; Conservation (in a digital or data context); Curation (in a digital or data context) ; Cyberinfrastructure ; Data access ; Data collection ; Data conservancy; Data discovery ; Data mining; Data provenance ; Data quality ; Data retrieval ; Digital library ; Data standards (in a non computer science context) ; Digitization ; e-Science or eScience ; Informatics ; Information architecture ; Information documenting ; Information modeling; Management (in a digital, data, information, or knowledge context) ; Metadata ; Ontology ; Policy (in a digital, data, or information context) ; Preservation (in a digital or data context) ; Representation (in a data, information, or knowledge context), Retrieval (in a digital, data, or information context) ; Semantic web ; Systems analysis ; & Web 2.0}. Many of these courses, while meeting the general search criteria, upon closer inspection, were not clearly associated with data curation when viewing the entire course description in context. The initial search strategy was purposefully broad though so as not to inadvertently miss courses. The course descriptions for each course were thus more closely scanned for courses that fit a data curation profile. A qualitative viewing of the descriptions was performed in which the context within which the keywords above occurred was taken into account and courses were eliminated from the pool. Only graduate level courses are included.

Data Curation for the purpose of inclusion was broadly defined as the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education. Data curation activities enable data discovery and retrieval, maintain data quality, add value, and provide for re-use over time. This field also includes authentication, data standards, archiving, collection and management, preservation, retrieval, knowledge representation, and policy. Even with this broad definition, some exceptions were made when course descriptions did not contain enough information. 

Institution Name, School or Department Name
Street Address
City, State
Country


Course Name

Course Credit

Required or Rec. for Degree

Delivery Mode 1

Delivery Mode 2

Course Name

Unites

Rec. or Req.

N/A

N/A

National Science FoundationCreation of this site was funded
in part by a grant from the
National Science Foundation.
  CIRSS