科学论文《为提取证据而拖延证据》 (Scientific Discourse Tagging for Evidence Extraction)

from arxiv, Accepted by The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021). 9 pages of main texts, 3 pages of references and 1 page of supportive information. 6 figures and 6 tables

Evidence plays a crucial role in any biomedical research narrative, providing justification for some claims and refutation for others. We seek to build models of scientific argument using information extraction methods from full-text papers. We present the capability of automatically extracting text fragments from primary research papers that describe the evidence presented in that paper's figures, which arguably provides the raw material of any scientific argument made within the paper. We apply richly contextualized deep representation learning pre-trained on biomedical domain corpus to the analysis of scientific discourse structures and the extraction of "evidence fragments" (i.e., the text in the results section describing data presented in a specified subfigure) from a set of biomedical experimental research articles. We first demonstrate our state-of-the-art scientific discourse tagger on two scientific discourse tagging datasets and its transferability to new datasets. We then show the benefit of leveraging scientific discourse tags for downstream tasks such as claim-extraction and evidence fragment detection. Our work demonstrates the potential of using evidence fragments derived from figure spans for improving the quality of scientific claims by cataloging, indexing and reusing evidence fragments as independent documents.

翻译：在生物医学研究叙事中,证据在任何生物医学研究叙事中起着关键作用,为某些主张提供了依据,对另一些主张提出了反驳。我们试图利用全文论文中的信息提取方法建立科学论证模型。我们展示了从初级研究论文中自动提取文字碎片的能力,这些文字碎片描述了该论文中提供的证据,可以说提供了在论文中提出的任何科学论点的原始材料。我们在生物医学领域对科学论述结构进行分析和从一组生物医学实验文章中提取“证据碎片”(即结果部分中描述特定子图中提供的数据的文字)时,运用了丰富背景化的深层次代表性学习方法。我们首先展示了我们在两个科学论述中最先进的科学论述塔格,这两个科学论述标注了数据集及其可转移到新数据集的可能性。然后我们展示了利用科学论述标记的好处,用于下游任务,例如索赔扩展和证据碎片探测。我们的工作展示了利用从图表中得出的证据碎片碎片的可能性,通过编目、索引和将证据碎片作为独立文件来改进科学索赔的质量。

相关内容

信息抽取

关注 350

信息抽取（Information Extraction: IE）是把文本里包含的信息进行结构化处理，变成表格一样的组织形式。输入信息抽取系统的是原始文本，输出的是固定格式的信息点。信息点从各种各样的文档中被抽取出来，然后以统一的形式集成在一起。这就是信息抽取的主要任务。信息以统一的形式集成在一起的好处是方便检查和比较。信息抽取技术并不试图全面理解整篇文档，只是对文档中包含相关信息的部分进行分析。至于哪些信息是相关的，那将由系统设计时定下的领域范围而定。

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

专知会员服务

60+阅读 · 2020年5月2日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【清华大学-腾讯】关系提取综述，Review and Outlook for Relation Extraction

专知会员服务

38+阅读 · 2020年4月8日

【ICLR2020】利用图神经网络进行高效概率逻辑推理，Efficient Probabilistic Logic Reasoning with Graph Neural Networks

专知会员服务

113+阅读 · 2020年1月29日