PDF文章的结构参考文献:评估书目参考文献提取和分析工具 (Structured references from PDF articles: assessing the tools for bibliographic reference extraction and parsing)

Many solutions have been provided to extract bibliographic references from PDF papers. Machine learning, rule-based and regular expressions approaches were among the most used methods adopted in tools for addressing this task. This work aims to identify and evaluate all and only the tools which, given a full-text paper in PDF format, can recognise, extract and parse bibliographic references. We identified seven tools: Anystyle, Cermine, ExCite, Grobid, Pdfssa4met, Scholarcy and Science Parse. We compared and evaluated them against a corpus of 56 PDF articles published in 27 subject areas. Indeed, Anystyle obtained the best overall score, followed by Cermine. However, in some subject areas, other tools had better results for specific tasks.

翻译：为从PDF文件中提取文献参考材料提供了许多解决办法。机械学习、基于规则和定期表达方式是用于完成这项任务的工具中最常用的方法之一。这项工作旨在确定和评价所有工具,而且只有以PDF格式的全文文件能够识别、提取和分析文献参考材料的工具。我们确定了7个工具:任何型、铜、ExCite、Grobid、Pdfssa4met、Pdfssa4met、学者和科学分析。我们对照在27个主题领域发表的56篇PDF文章对这些工具进行了比较和评价。事实上,任何型都获得了最好的总分,Cermine随后是Cermine。然而,在一些主题领域,其他工具在具体任务方面有更好的结果。

相关内容

TOOLS

关注 0

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/