This report presents the evaluation approach developed for the DARPA Big Mechanism program, which aimed at developing computer systems that will read research papers, integrate the information into a computer model of cancer mechanisms, and frame new hypotheses. We employed an iterative, incremental approach to the evaluation of the three phases of the program. In Phase I, we evaluated the ability of system and human teams ability to read-with-a-model to capture mechanistic information from the biomedical literature, integrated with information from expert curated biological databases. In Phase II we evaluated the ability of systems to assemble fragments of information into a mechanistic model. The Phase III evaluation focused on the ability of systems to provide explanations of experimental observations based on models assembled (largely automatically) by the Big Mechanism process. The evaluation for each phase built on earlier evaluations and guided developers towards creating capabilities for the new phase. The report describes our approach, including innovations such as a reference set (a curated data set limited to major findings of each paper) to assess the accuracy of systems in extracting mechanistic findings in the absence of a gold standard, and a method to evaluate model-based explanations of experimental data. Results of the evaluation and supporting materials are included in the appendices.
翻译:本报告介绍了为DARPA大型机制方案制定的评价方法,该方案旨在开发计算机系统,以阅读研究论文,将信息纳入癌症机制的计算机模型,并设定新的假设;我们对方案的三个阶段的评价采用了迭代、渐进式的方法;在第一阶段,我们评价了系统和人类团队与模型一起阅读生物医学文献的机械信息的能力,该模型与专家整理的生物数据库信息相结合;在第二阶段,我们评价了将信息碎片汇集成机械模型的系统的能力;第三阶段评价侧重于各系统根据大机制进程收集的模型(大都自动)提供实验观测解释的能力;每个阶段的评价以早期的评价为基础,指导开发者为新阶段创造能力;报告介绍了我们的方法,包括一套参考材料(限于每份文件主要调查结果的整理数据集),评估在没有黄金标准的情况下提取机械结论的系统准确性,以及评估实验数据模型解释的方法;评价结果和辅助材料的附录载于附录。