在大型不同层次学术网络中,对称学习,以区别名称 (Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous Academic Networks)

Name disambiguation aims to identify unique authors with the same name. Existing name disambiguation methods always exploit author attributes to enhance disambiguation results. However, some discriminative author attributes (e.g., email and affiliation) may change because of graduation or job-hopping, which will result in the separation of the same author's papers in digital libraries. Although these attributes may change, an author's co-authors and research topics do not change frequently with time, which means that papers within a period have similar text and relation information in the academic network. Inspired by this idea, we introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem. We divided papers into small blocks based on discriminative author attributes and blocks of the same author will be merged according to pairwise classification results of MA-PairRNN. MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework. In addition to attribute and structure information, MA-PairRNN also exploits semantic information by meta-path and generates node representation in an inductive way, which is scalable to large graphs. Furthermore, a semantic-level attention mechanism is adopted to fuse multiple meta-path based representations. A Pseudo-Siamese network consisting of two RNNs takes two paper sequences in publication time order as input and outputs their similarity. Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task. It was also demonstrated that MA-PairRNN can perform well with a small amount of training data and have better generalization ability across different research areas.

翻译：名称模糊化的目的是要识别使用相同名称的独特作者。现有的名称模糊化方法总是利用作者属性来强化模糊化结果。但是, 某些歧视性作者属性( 如电子邮件和关联性) 可能会因为毕业或选择工作而改变。这将导致将同一作者的论文分隔在数字图书馆中。虽然这些属性可能会改变, 但作者的共同作者和研究专题不会随着时间而经常改变, 这意味着在一段时期内的文件在学术网络中有相似的文本和关系信息。在这种理念的启发下, 我们引入基于多视图的 PairWise 常规神经网络网络网络( MA- PairRNNNN) 来解决名称模糊化问题。我们根据歧视作者的作者属性和区块划分成小块块。虽然这些属性可能会改变, 但作者的共同作者和研究主题不会随着时间的变化而改变。 MA- PairRNNNN 将一个混杂式的图形嵌入学习和相近于一个框架中。 MA- PairNNNNE 还可以利用基于元和双向双层图像显示双层数据序列的图像显示一个持续的运行状态。。在双层的图像中, 数据序列中, 以双层和双层解式的演示演示演示演示显示一个基于双层结构结构结构结构结构的演示演示演示的演示的演示的演示, 。在双层的演算式的演算式的演制的演制的演制的演制到一个在双向的演算式的演示文的演算式的演算过程。