There is a growing interest in designing systems for graph pattern mining in recent years. The existing systems mostly focus on small patterns and have difficulty in mining larger patterns. In this work, we propose Angelica, a single-machine graph pattern mining system aiming at supporting large patterns. We first propose a new computation model called multi-vertex exploration. The model allows us to divide a large pattern mining task into smaller matching tasks. Different from the existing systems which perform vertex-by-vertex exploration, we explore larger subgraphs by joining small subgraphs. Based on the new computation model, we further enhance the performance through an index-based quick pattern technique that addresses the issue of expensive isomorphism check, and approximate join that mitigates the issue of subgraph explosion of large patterns. The experimental results show that Angelica achieves significant speedups against the state-of-the-art graph pattern mining systems and supports large pattern mining that none of the existing systems can handle.
翻译:近年来,人们越来越有兴趣设计图案型式采矿系统。 现有系统主要侧重于小模式,在开采大模式方面有困难。 在这项工作中,我们提议Angelica,一个旨在支持大模式的单机型图案采矿系统。 我们首先提出一个新的计算模型,称为多脊椎勘探。 这个模型使我们可以将大型图案采矿任务分为较小的匹配任务。 不同于现有的进行垂直对脊椎勘探的系统,我们通过加入小型子集,探索更大的子集。 根据新的计算模型,我们进一步通过基于指数的快速模式技术提高绩效,该方法处理昂贵的偏向式检查问题,并大致结合该方法减轻大型模式子图案爆炸的问题。 实验结果表明,Angelica在与最先进的图案型采矿系统相比,取得了显著的加速,并支持现有系统无法处理的大型图案式采矿。