Information extraction from scholarly articles is a challenging task due to the sizable document length and implicit information hidden in text, figures, and citations. Scholarly information extraction has various applications in exploration, archival, and curation services for digital libraries and knowledge management systems. We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles. Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary. We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles, which we openly publish as a resource for the research community. Our results show that structured summarization is a suitable approach for targeted information extraction that complements other commonly used methods such as question answering and named entity recognition.
翻译:从学术文章中提取信息是一项具有挑战性的任务,因为文件篇幅庞大,在文本、数字和引文中隐藏了隐含的信息。学术信息提取在数字图书馆和知识管理系统的探索、档案和整理服务中有着各种应用。我们介绍了一种信息提取技术,即MORTY,它从学术文章中产生结构化的文本摘要。我们的方法将该条的全文压缩为财产价值夫妇,作为条块分割的文本片段,称为结构化摘要。我们还提供了一套数量庞大的学术数据集,将从学术知识图表中提取的结构化摘要和相应的公开可获取的科学文章结合起来,我们公开出版这些科学文章,作为研究界的资源。我们的结果显示,结构化组合是有针对性的信息提取的合适方法,它补充了其他常用的方法,例如问答和命名实体识别。