Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics cause great challenges for efficient storage and subsequent query analysis on them. Current studies apply sketches to summarize graph streams. We propose LSketch that works for heterogeneous graph streams, which effectively preserves the label information carried by the streams in real scenes, thereby enriching the expressive ability of sketches. In addition, as graph streams continue to evolve over time, edges too old may lose their practical significance. Therefore, we introduce the sliding window model into LSketch to eliminate the expired edges automatically. LSketch uses sub-linear storage space and can support structure based queries and time-sensitive queries with high accuracy. We perform extensive experiments over four real datasets, demonstrating the superiority of the proposed method over state-of-the-art methods, in aspects of query accuracy and time efficiency.
翻译:图流数据代表实时应用程序中的数据交互。图流数据挖掘在网络安全、社交网络分析和交通控制等领域中发挥着重要作用。然而,巨大的数据量和高动态性对于高效的数据存储和后续的查询分析带来了巨大挑战。当前研究采用数据快照来总结图流数据。我们提出了 LSketch,适用于异构图流数据,它可以有效地保留实时场景中流中承载的标签信息,从而丰富数据快照的表达能力。此外,因为图流数据会随着时间不断改变,太老的边缘会失去实际意义。因此,我们将滑动窗口模型引入到 LSketch 中,自动消除过期的边缘。LSketch 使用亚线性存储空间,可以支持结构化查询和时敏查询,并具有较高的精度。我们在四个真实的数据集上进行了大量实验,证明了所提出的方法在查询准确性和时间效率方面优于最先进的方法。