E-Research Roundtable - How to index biological knowledge about species in one day?

Wednesday, March 29, 2017
12:30pm - 2:00pm

109 IS

Session leaders: Dmitry Mozzherin, molecular biologist and Biodiversity Informatician at the Illinois Natural History Survey
Description: For the last 250 years we use binomial nomenclature to communicate information about animals, plants and bacteria. Introduction of the binomial nomenclature helped tremendously to expand our knowledge about the life on our planet. Biodiversity Heritage Library project collected more then 50 million pages this knowledge, spanning several hundred years. To be able to work with this massive amount of data we need to find and organize scientific names mentioned on each of these pages. The task is surprisingly complicated because on average, there are 3 scientific names per 1 species, and about 50 different ways these names were written. Global Names Architecture creates tools that allow to find how all these various names and their spellings are connected, and to organize and disambiguate them. There are 3 stages in this disambiguation. First there is a lexical stage where spelling variants of a scientific names are organized into lexical groups. Second stage is nomenclatural, that allows to find evolution of a names in scientific literature, and the third, taxonomical stage, finds a currently adopted name for a taxon. Global Names Architecture also develops tools for recognizing scientific names in texts and our goal is to be able to go through all accumulated biological knowledge and index it in a matter of a day.

Dmitry Mozzherin had been a molecular biologist for 15 years, and studied DNA replication and how various analogs of nucleotides can be used to selectively switch off DNA polymerases of viruses leaving human replication machinery intact. Later he became interested in Open Source movement, and learned programming. Dmitry worked at Encyclopedia of Life project that collects information about all species in the world, and for the last 8 years he is trying to figure out how to globally organize scientific information using scientific names as a glue. Dmitry's other passions are wild life photography and sculpture.