Subgraph enumeration is a fundamental problem in graph analytics, which aims to find all instances of a given query graph on a large data graph. In this paper, we propose a system called HUGE to efficiently process subgraph enumeration at scale in the distributed context. HUGE features 1) an optimiser to compute an advanced execution plan without the constraints of existing works; 2) a hybrid communication layer that supports both pushing and pulling communication; 3) a novel two-stage execution mode with a lock-free and zero-copy cache design, 4) a BFS/DFS-adaptive scheduler to bound memory consumption, and 5) two-layer intra- and inter-machine load balancing. HUGE is generic such that all existing distributed subgraph enumeration algorithms can be plugged in to enjoy automatic speed up and bounded-memory execution.
翻译:地名录查点是图解分析中的一个基本问题,图解分析的目的是在大型数据图表中找到所有特定查询图的事例。在本文中,我们提议建立一个称为“HUGE”的系统,以便有效地处理分布在分布环境中的大规模子图查点。 HUGE具有以下特点:(1) 一种最理想的计算方法,可以不受现有工程的限制,计算先进的执行计划;(2) 一种混合通信层,既支持推力又拉力通信;(3) 一种新型的两阶段执行模式,采用无锁和零复制缓存设计;(4) 一种BFS/DFS-适应调度程序,以约束内存消耗;和(5) 双层机器内和机器间负载平衡。 HUGE是一种通用的方法,所有现有的分层查点算法都可以插入,以享受自动加速和闭合式执行。