Graph pattern mining (GPM) is an important application that identifies structures from graphs. Despite the recent progress, the performance gap between the state-of-the-art GPM systems and an efficient algorithm--pattern decomposition--is still at least an order of magnitude. This paper clears the fundamental obstacles of adopting pattern decomposition to a GPM system. First, the performance of pattern decomposition algorithms depends on how to decompose the whole pattern into subpatterns. The original method performs complexity analysis of algorithms for different choices, and selects the one with the lowest complexity upper bound. Clearly, this approach is not feasible for average or even expert users. To solve this problem, we develop a GPM compiler with conventional and GPM-specific optimizations to generate algorithms for different decomposition choices, which are evaluated based on an accurate cost model. The executable of the GPM task is obtained from the algorithm with the best performance. Second, we propose a novel partial-embedding API that is sufficient to construct advanced GPM applications while preserving pattern decomposition algorithm advantages. Compared to state-of-the-art systems, our new GPM system, DecoMine, developed based on the ideas, reduces the execution time of GPM on large graphs and patterns from days to a few hours with low programming effort.
翻译:尽管最近取得了进展,但最先进的GPM系统与高效的算法-模式分解系统之间的性能差距至少仍然是一个数量级。本文件明确了采用模式分解到GPM系统的根本障碍。首先,模式分解算法的性能取决于如何将整个模式分解成亚型。最初的方法对不同选择的算法进行了复杂分析,并选择了最复杂最难选择的。显然,对于普通用户甚至专家用户来说,这一方法并不可行。为了解决这个问题,我们开发了一个具有常规和GPM特定优化的GPM汇编器,以生成不同分解选择的算法,这种算法是根据准确的成本模型进行评估的。GPM任务的执行能力取决于如何从算法中分解成子体。第二,我们建议一种新型的局部组合算法,它足以构建先进的GPM应用程序,同时保留模式分解算法的优势。为了解决这个问题,我们开发了一个具有常规和特定精度的GPM模型的GPM模型, 将一个基于州级的时程的时程, 降低我们的系统。