Graph mining is one of the most important categories of graph algorithms. However, exploring the subgraphs of an input graph produces a huge amount of intermediate data. The 'think like a vertex' programming paradigm, pioneered by Pregel, cannot readily formulate mining problems, which is designed to produce graph computation problems like PageRank. Existing mining systems like Arabesque and RStream need large amounts of computing and memory resources. In this paper, we present Kaleido, an efficient single machine, out-of-core graph mining system which treats disks as an extension of memory. Kaleido treats intermediate data in graph mining tasks as a tensor and adopts a succinct data structure for the intermediate data. Kaleido utilizes the eigenvalue of the adjacency matrix of a subgraph to efficiently solve the subgraph isomorphism problems with an acceptable constraint that the vertex number of a subgraph is less than 9. Kaleido implements half-memory-half-disk storage for storing large intermediate data, which treats the disk as an extension of the memory. Comparing with two state-of-the-art mining systems, Arabesque and RStream, Kaleido outperforms them by a GeoMean 12.3$\times$ and 40.0$\times$ respectively.
翻译:图表开采是最重要的图表算法类别之一。 然而, 探索输入图的子图生成了大量中间数据。 “ 思考像顶点” 编程模式, 由Pregel率先推出, 无法轻易地提出采矿问题, 目的是产生像PageRank这样的图表计算问题。 现有的Arabeesque和RStream等采矿系统需要大量的计算和记忆资源。 在本文中, 我们介绍Kaleido, 一种高效的单一机器, 核心图外采矿系统, 将磁盘作为记忆的延伸。 Kaleido 将图形采矿任务中的中间数据作为高压处理, 并且为中间数据采用简洁的数据结构。 Kaleido 使用子图的对称矩阵的对称价值来有效解决子地形问题, 而一个子图的顶点数量小于9. Kaleirodo 用于存储大型中间数据的半位半位半位半位存储器。 将磁盘作为记忆的延伸部分处理。 Kaleidodrido, 将磁盘与两个州- mismatime- mextimetime- mission 分别由40 和阿拉伯和埃及- mest- mistime- mistime- mistime- mistime- mex- mistime- se- mistime- comms