arrowCIRSS Home arrow Publications arrow Publication Detail

Blake, C. & Lucic, A. (2015) An automated approach to identify endpoints to support the systematic review process. Journal of Biomedical Informatics, 56, 42-56. doi:dx.doi.org/10.1016/j.jbi.2015.05.004

Full APA Reference

Blake, C. & Lucic, A. (2015) An automated approach to identify endpoints to support the systematic review process. Journal of Biomedical Informatics, 56, 42-56. doi:dx.doi.org/10.1016/j.jbi.2015.05.004

Publication Abstract

Preparing a systematic review can take hundreds of hours to complete, but the process of reconciling different results from multiple studies is the bedrock of evidence-based medicine. We introduce a two-step approach to automatically extract three facets – two entities (the agent and object) and the way in which the entities are compared (the endpoint) – from direct comparative sentences in full-text articles. The system does not require a user to predefine entities in advance and thus can be used in domains where entity recognition is difficult or unavailable. As with a systematic review, the tabular summary produced using the automatically extracted facets shows how experimental results differ between studies. Experiments were conducted using a collection of more than 2 million sentences from three journals Diabetes, Carcinogenesis and Endocrinology and two machine learning algorithms, support vector machines (SVM) and a general linear model (GLM). F1 and accuracy measures for the SVM and GLM differed by only 0.01 across all three comparison facets in a randomly selected set of test sentences. The system achieved the best performance of 92% for objects, whereas the accuracy for both agent and endpoints was 73%. F1 scores were higher for objects (0.77) than for endpoints (0.51) or agents (0.47). A situated evaluation of Metformin, a drug to treat diabetes, showed system accuracy of 95%, 83% and 79% for the object, endpoint and agent respectively. The situated evaluation had higher F1 scores of 0.88, 0.64 and 0.62 for object, endpoint, and agent respectively. On average, only 5.31% of the sentences in a full-text article are direct comparisons, but the tabular summaries suggest that these sentences provide a rich source of currently underutilized information that can be used to accelerate the systematic review process and identify gaps where future research should be focused.