Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50x on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community.
翻译:为了大大加速这些问题,我们提议了ProbGraph : 一种图形表达方式,它能让简单和快速的近似平行图形的开采,在工作、深度和结果准确性方面有很强的理论保证。 关键的想法是使用Bloom过滤器等概率集的表示方式代表各套脊椎。 这些表示方式的处理速度要快于原顶部的表示方式,这要归功于可传承性和小尺寸。 我们用这些表示方式作为重要的平行图形采矿算法的构件,例如 Clique 计法或 CropGraph 。 当用 ProbGraph 来强化这些算法时,这些算法大大超过平行精确基线(32个核心近50x),同时确保许多输入图形数据集的准确性超过90%。 我们基于概率集的描述方式和理想统计属性的新界限和算法对于数据分析界来说是分别感兴趣的。