Bibliographic coupling (BC) and co-citation (CC) are the two most common citation-based coupling measures of similarity between scientific items. One can interpret these measures as second-neighbor relations distinguished by the direction of the citation: BC is a similarity between two citing items, whereas CC is that between two cited items. A previous study proposed a two-layer node split network that can emulate clusters of coupling measures in a computationally efficient manner; however, the lack of intralayer links makes it impossible to obtain exact similarities. Here, we propose novel methods to estimate intralayer similarity on a node split network using personalized PageRank and neural embedding. We demonstrate that the proposed measures are strongly correlated with the coupling measures. Moreover, our proposed method can yield precise similarities between items even if they are distant from each other. We also show that many links with high similarity are missing in the original BC/CC network, which suggests that it is essential to consider long-range similarities. Comparative experiments on global and local edge sampling suggest that local sampling is stable for both similarities in node split networks. This analysis offers valuable insights into the process of searching for significantly related items regarding each coupling measure.
翻译:生物学和共同引用(CC)是科学物品之间最常见的基于引用的混合措施,其相似性是科学物品之间最常用的两种基于引用的混合措施。我们可以将这些措施解释为与引用方向不同的第二邻居关系:BC是两个引用的物品之间的相似性,而CC是两个引用的物品之间的相似性。上一份研究报告建议建立一个双层节点分割网络,可以以计算效率的方式仿效混合措施的组合;然而,由于缺少层内联系,因此不可能取得确切的相似性。在这里,我们建议采用新的方法,利用个人化的PepherRank和神经嵌入来估计节点分裂网络的内部相似性。我们证明,拟议的措施与混合措施密切相关。此外,我们提出的方法可以在项目之间产生确切的相似性,即使它们彼此相距遥远。我们还表明,在最初的BC/CC网络中缺少许多高度相似性的联系,这表明必须考虑长距离的相似性。关于全球和地方边缘取样的比较实验表明,对于在节点分割网络上的每个相似性来说,当地取样是稳定的。这一分析提供了宝贵的见解。