Author names often suffer from ambiguity owing to the same author appearing under different names and multiple authors possessing similar names. It creates difficulty in associating a scholarly work with the person who wrote it, thereby introducing inaccuracy in credit attribution, bibliometric analysis, search-by-author in a digital library, and expert discovery. A plethora of techniques for disambiguation of author names has been proposed in the literature. I try to focus on the research efforts targeted to disambiguate author names. I first go through the conventional methods, then I discuss evaluation techniques and the clustering model which finally leads to the Bayesian learning and Greedy agglomerative approach. I believe this concentrated review will be useful for the research community because it discusses techniques applied to a very large real database that is actively used worldwide. The Bayesian and the greedy agglomerative approach used will help to tackle AND problems in a better way. Finally, I try to outline a few directions for future work
翻译:作者的姓名往往因同一位作者的名字不同和多位作者的名字相似而变得模糊不清,这给将学术工作与写作者联系起来造成困难,从而在信用归属、生物量分析、数字图书馆的逐作者搜索和专家发现方面造成不准确的情况。文献中提出了大量混淆作者姓名的技术。我试图集中研究旨在混淆作者姓名的研究工作。我首先研究传统方法,然后讨论最终导致巴伊西亚学习和贪婪聚合方法的评价技术和集群模式。我认为,这种集中审查对研究界将是有益的,因为它讨论技术应用于全世界积极使用的非常大的真实数据库。贝伊西亚和贪婪的聚合方法将有助于更好地解决和问题。最后,我试图为未来工作提出几个方向。