Current citation practices observed in articles are very noisy, confusing, and not standardised at all, making the identification of the cited works problematic for humans and any reference extraction software. In this work, we want to investigate such citation practices for referencing different types of entities and, in particular, for understanding what the most used metadata in bibliographic references are. We identified 36 different types of cited entities (the most cited ones were articles, books, and proceeding papers) within the 34,140 bibliographic references extracted from a huge set of journal articles on 27 different subject areas. The analysis of such bibliographic references, grouped by the particular type of cited entities, enabled us to highlight the most used metadata for defining bibliographic references across the subject areas. However, we also noticed that, in some cases, bibliographic references did not provide the essential elements to easily identify the work they refer to.
翻译:文章中观察到的当前引用做法非常吵闹、混乱,根本没有标准化,使得所引用的作品的识别对人类和任何参考提取软件都产生问题。在这项工作中,我们希望调查这些引用做法,以查找不同类型的实体,特别是了解文献参考中最常用的元数据是什么。我们在34 140个参考文献中确定了36个不同类型的引用实体(最引证的实体是文章、书籍和诉讼文件),这些参考文献摘自27个不同主题领域的大量期刊文章。这些参考文献按特定类型的引用实体分类,分析这些参考文献,使我们能够突出用于界定各主题领域参考文献的最常用的元数据。然而,我们还注意到,在某些情况下,参考文献并未提供易于确定所提及工作的基本要素。