Link prediction is a key problem for network-structured data, attracting considerable research efforts owing to its diverse applications. The current link prediction methods focus on general networks and are overly dependent on either the closed triangular structure of networks or node attributes. Their performance on sparse or highly hierarchical networks has not been well studied. On the other hand, the available tree-like benchmark datasets are either simulated, with limited node information, or small in scale. To bridge this gap, we present a new benchmark dataset TeleGraph, a highly sparse and hierarchical telecommunication network associated with rich node attributes, for assessing and fostering the link inference techniques. Our empirical results suggest that most of the algorithms fail to produce a satisfactory performance on a nearly tree-like dataset, which calls for special attention when designing or deploying the link prediction algorithm in practice.
翻译:链接预测是网络结构数据的一个关键问题,由于网络结构数据的多种应用,吸引了大量的研究努力。目前的链接预测方法以一般网络为重点,过分依赖封闭的网络三角结构或节点属性。这些网络在稀疏或高度等级网络上的性能没有得到很好研究。另一方面,现有的树类基准数据集不是模拟的,节点信息有限,就是规模较小。为了缩小这一差距,我们提出了一个新的基准数据集TeleGraph,这是一个高度分散和等级分级的电信网络,与丰富的节点属性相关联,用来评估和培养链接推导技术。我们的经验结果表明,大多数算法未能在接近树类的数据集上产生令人满意的性能,这要求在设计或实际运用链接预测算法时特别注意。