As the number of authors is increasing exponentially over years, the number of authors sharing the same names is increasing proportionally. This makes it challenging to assign newly published papers to their adequate authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use a collection from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, which is represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.
翻译:由于多年来作者数量呈指数式增长,分享相同姓名的作者数量正成倍增加,因此,将新出版的论文分配给适当的作者是困难的。因此,在数字图书馆中,作者姓名Ambigiity(ANA)被认为是一个关键的未决问题。本文建议采用作者姓名Disambiguation(AND)方法,利用作者的共同作者和研究领域,将作者姓名与其真实世界实体联系起来。为此,我们使用DBLP存储库的收藏,其中收藏了大约260万共同作者撰写的500多万份书目记录。我们的方法首先将拥有相同最后姓名和首名首名首字母的作者分组。每个组内的作者通过捕捉与其共同作者的关系和研究领域(由相应作者的经认证的出版物标题所代表)来识别。为此,我们培训一个神经网络模型,从共同作者和书名的表述中学习。我们通过在大型数据集上进行广泛的实验来验证我们的方法的有效性。