This paper explores the possibility of classifying journal articles by exploiting multiple information sources, instead of relying on only one information source at a time. In particular, the Similarity Network Fusion (SNF) technique is used to merge the different layers of information about articles when they are organized as a multiplex network. The method proposed is tested on a case study consisting of the articles published in the Cambridge Journal of Economics. The information about articles is organized in a two-layer multiplex where the first layer contains similarities among articles based on the full-text of articles, and the second layer contains similarities based on the cited references. The unsupervised similarity network fusion process combines the two layers by building a new single-layer network. Distance correlation and partial distance correlation indexes are then used for estimating the contribution of each layer of information to the determination of the structure of the fused network. A clustering algorithm is lastly applied to the fused network for obtaining a classification of articles. The classification obtained through SNF has been evaluated from an expert point of view, by inspecting whether it can be interpreted and labelled with reference to research programs and methodologies adopted in economics. Moreover, the classification obtained in the fused network is compared with the two classifications obtained when cited references and contents are considered separately. Overall, the classification obtained on the fused network appears to be fine-grained enough to represent the extreme heterogeneity characterizing the contributions published in the Cambridge Journal of Economics.
翻译:本研究探索了通过利用多种信息源,而不是仅依赖于单一信息源来分类期刊文章的可能性。特别地,本研究采用了相似性网络融合(SNF)技术,将不同层面的文章信息组织成一个多重网络,并通过SNF方法合并以建立新的网络,从而实现分类目的。本研究以Cambridge Journal of Economics的文章为例进行测试。文章信息被组织成一个二层复合网络,第一层包含基于全文的相似度,第二层包含基于引文的相似度。无督学习的相似性网络融合过程将这两层进行整合,构建成一个新的单层网络。然后,利用距离相关和偏相关距离指标,评估每个信息层对融合网络结构的贡献。最后,应用聚类算法对融合网络进行分类,以得到文章的分类。本研究通过专家评估,检查分类结果是否能够解释和标记经济学领域的研究计划和方法。此外,本研究将融合网络分类结果与仅考虑引用文献和内容的两个分类结果进行了比较。总体而言,通过SNF实现的分类结果足以刻画Cambridge Journal of Economics中发表论文的极端异质性。