The best evidence concerning comparative treatment effectiveness comes from clinical trials, the results of which are reported in unstructured articles. Medical experts must manually extract information from articles to inform decision-making, which is time-consuming and expensive. Here we consider the end-to-end task of both (a) extracting treatments and outcomes from full-text articles describing clinical trials (entity identification) and, (b) inferring the reported results for the former with respect to the latter (relation extraction). We introduce new data for this task, and evaluate models that have recently achieved state-of-the-art results on similar tasks in Natural Language Processing. We then propose a new method motivated by how trial results are typically presented that outperforms these purely data-driven baselines. Finally, we run a fielded evaluation of the model with a non-profit seeking to identify existing drugs that might be re-purposed for cancer, showing the potential utility of end-to-end evidence extraction systems.
翻译:关于比较治疗有效性的最佳证据来自临床试验,其结果在未经结构化的物品中报告。医学专家必须手工从文章中提取信息,以便为决策提供信息,这既费时又费钱。在这里,我们考虑以下两方面的端到端任务:(a) 提取治疗结果和描述临床试验(实体识别)的全文文章的结果,以及(b) 推断前者在临床试验方面的报告结果(关系提取)。我们为此任务引入新数据,并评价最近就自然语言处理类似任务取得最新结果的模型。我们然后提出一种新的方法,其动机是试验结果通常如何显示优于这些纯粹由数据驱动的基线。最后,我们对模型进行实地评估,非盈利性地寻求确定可能重新用于癌症的现有药物(相关提取)。我们提出了最终证据提取系统的潜在效用。