Evidence plays a crucial role in any biomedical research narrative, providing justification for some claims and refutation for others. We seek to build models of scientific argument using information extraction methods from full-text papers. We present the capability of automatically extracting text fragments from primary research papers that describe the evidence presented in that paper's figures, which arguably provides the raw material of any scientific argument made within the paper. We apply richly contextualized deep representation learning pre-trained on biomedical domain corpus to the analysis of scientific discourse structures and the extraction of "evidence fragments" (i.e., the text in the results section describing data presented in a specified subfigure) from a set of biomedical experimental research articles. We first demonstrate our state-of-the-art scientific discourse tagger on two scientific discourse tagging datasets and its transferability to new datasets. We then show the benefit of leveraging scientific discourse tags for downstream tasks such as claim-extraction and evidence fragment detection. Our work demonstrates the potential of using evidence fragments derived from figure spans for improving the quality of scientific claims by cataloging, indexing and reusing evidence fragments as independent documents.
翻译:在生物医学研究叙事中,证据在任何生物医学研究叙事中起着关键作用,为某些主张提供了依据,对另一些主张提出了反驳。我们试图利用全文论文中的信息提取方法建立科学论证模型。我们展示了从初级研究论文中自动提取文字碎片的能力,这些文字碎片描述了该论文中提供的证据,可以说提供了在论文中提出的任何科学论点的原始材料。我们在生物医学领域对科学论述结构进行分析和从一组生物医学实验文章中提取“证据碎片”(即结果部分中描述特定子图中提供的数据的文字)时,运用了丰富背景化的深层次代表性学习方法。我们首先展示了我们在两个科学论述中最先进的科学论述塔格,这两个科学论述标注了数据集及其可转移到新数据集的可能性。然后我们展示了利用科学论述标记的好处,用于下游任务,例如索赔扩展和证据碎片探测。我们的工作展示了利用从图表中得出的证据碎片碎片的可能性,通过编目、索引和将证据碎片作为独立文件来改进科学索赔的质量。