Link prediction is one of the central problems in graph mining. However, recent studies highlight the importance of higher-order network analysis, where complex structures called motifs are the first-class citizens. We first show that existing link prediction schemes fail to effectively predict motifs. To alleviate this, we establish a general motif prediction problem and we propose several heuristics that assess the chances for a specified motif to appear. To make the scores realistic, our heuristics consider - among others - correlations between links, i.e., the potential impact of some arriving links on the appearance of other links in a given motif. Finally, for highest accuracy, we develop a graph neural network (GNN) architecture for motif prediction. Our architecture offers vertex features and sampling schemes that capture the rich structural properties of motifs. While our heuristics are fast and do not need any training, GNNs ensure highest accuracy of predicting motifs, both for dense (e.g., k-cliques) and for sparse ones (e.g., k-stars). We consistently outperform the best available competitor by more than 10% on average and up to 32% in area under the curve. Importantly, the advantages of our approach over schemes based on uncorrelated link prediction increase with the increasing motif size and complexity. We also successfully apply our architecture for predicting more arbitrary clusters and communities, illustrating its potential for graph mining beyond motif analysis.
翻译:链接预测是图案采矿的核心问题之一。然而,最近的研究凸显了更高级网络分析的重要性,在高阶网络分析中,被称为“motifs”的复杂结构是一流公民。我们首先显示,现有的连接预测计划无法有效地预测motifs。为了缓解这一点,我们确立了一个总体的motif预测问题,我们提出一些雄心主义分析,评估特定运动的出现机会。为了使分数现实,我们的惯性研究考虑(除其他外)各种联系之间的关联,即某些即将到来的链接对某个运动点中其他联系的外观的潜在影响。最后,为了达到最高准确性,我们开发了一个图形神经网络(GNNN)的模型(GNN)来进行motifs预测。为了缓解这一点,我们的结构提供了一个一般的脊椎特征和抽样计划,可以捕捉到一个运动点的丰富结构特性。虽然我们的神经论速度很快,而且不需要任何培训,但GNNNIS确保了最精确性地预测 motifs,对于密度(例如 k-criques)和稀少社区(例如、K-staria-cregistring up to colate to comlif)和对10 commet)的分析更精确的图比我们现有的平均的模型的模型更接近。我们不断超越了。我们比重的模型比比比。