One fundamental problem in temporal graph analysis is to count the occurrences of small connected subgraph patterns (i.e., motifs), which benefits a broad range of real-world applications, such as anomaly detection, structure prediction, and network representation learning. However, existing works focused on exacting temporal motif are not scalable to large-scale temporal graph data, due to their heavy computational costs or inherent inadequacy of parallelism. In this work, we propose a scalable parallel framework for exactly counting temporal motifs in large-scale temporal graphs. We first categorize the temporal motifs based on their distinct properties, and then design customized algorithms that offer efficient strategies to exactly count the motif instances of each category. Moreover, our compact data structures, namely triple and quadruple counters, enable our algorithms to directly identify the temporal motif instances of each category, according to edge information and the relationship between edges, therefore significantly improving the counting efficiency. Based on the proposed counting algorithms, we design a hierarchical parallel framework that features both inter- and intra-node parallel strategies, and fully leverages the multi-threading capacity of modern CPU to concurrently count all temporal motifs. Extensive experiments on sixteen real-world temporal graph datasets demonstrate the superiority and capability of our proposed framework for temporal motif counting, achieving up to 538* speedup compared to the state-of-the-art methods. The source code of our method is available at: https://github.com/steven-ccq/FAST-temporal-motif.
翻译:在时间图分析中,一个根本性的问题是计算小型连接的子图模式(即motifs)的发生,这种模式有利于一系列广泛的真实世界应用,例如异常检测、结构预测和网络代表性学习。然而,目前侧重于精确时间图的工程无法与大型时间图数据相适应,因为其计算成本过高或固有的平行关系不足。在这项工作中,我们提出了一个可缩放的平行框架,用于在大型时间图中精确计算时间模型。我们首先根据时间模型的特性对时间模型进行分类,然后设计定制的算法,提供有效战略,精确计算每一类的模型实例。此外,我们的压缩数据结构,即三重和四重反调,使我们的算法能够直接确定每一类的时间模型的情况,根据边际信息以及边缘之间的关系,从而大大提高了计算效率。根据拟议的计算算法,我们设计了一个分级平行的源框架,在内部和内部的平行战略中进行分类,然后设计一个定制的算法,提供有效战略,精确计算每一类的模型实例。此外,我们的压缩数据结构结构,即三重和四重的模型的模型计算能力,用来计算。