In the academic world, the number of scientists grows every year and so does the number of authors sharing the same names. Consequently, it challenging to assign newly published papers to their respective authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use data collected from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.
翻译:在学术界,每年都会有越来越多的科学家涌现,姓名相同的作者也随之增多。因此,如何将新出版的论文分配给相应的作者具有挑战性。因此,作者姓名模糊性(ANA)被认为是数字图书馆中的一个重要问题。本文提出了一种作者姓名实体消歧(AND)方法,通过利用他们的共同作者和研究领域,将作者名称链接到他们的现实实体。为此,我们使用从DBLP存储库中收集的数据,其中包含由约260万个共同作者撰写的超过500万个参考文献记录。我们的方法首先将具有相同姓氏和相同名字首字母的作者分组。其中每个组中的作者通过捕获其与合作者和研究领域的关系来进行识别,通过对应作者的验证出版物标题来表示该领域。为此,我们训练了一个神经网络模型,该模型从合作者和标题的表示中学习。我们通过对大型数据集进行广泛的实验验证了我们的方法的有效性。