Graph datasets exceed the in-memory capacity of most standalone machines. Traditionally, graph frameworks have overcome memory limitations through scale-out, distributing computing. Emerging frameworks avoid the network bottleneck of distributed data with Semi-External Memory (SEM) that uses a single multicore node and operates on graphs larger than memory. In SEM, $\mathcal{O}(m)$ data resides on disk and $\mathcal{O}(n)$ data in memory, for a graph with $n$ vertices and $m$ edges. For developers, this adds complexity because they must explicitly encode I/O within applications. We present principles that are critical for application developers to adopt in order to achieve state-of-the-art performance, while minimizing I/O and memory for algorithms in SEM. We present them in Graphyti, an extensible parallel SEM graph library built on FlashGraph and available in Python via pip. In SEM, Graphyti achieves 80% of the performance of in-memory execution and retains the performance of FlashGraph, which outperforms distributed engines, such as PowerGraph and Galois.
翻译:图形框架通常通过缩放、 分配计算来克服内存限制。 新兴框架避免了半外部内存( SEM) 的分布数据的网络瓶颈, 半外部内存( SEM) 使用单一多核心节点, 运行在大于内存的图形上。 在 SEM 中, $\ mathcal{O}( m) 美元数据存在于磁盘和 $\ mathcal{O} (n) 存储中的数据, 用于一个有 $ vertics 和 $ $ 边缘的图形。 对于开发者来说, 这增加了复杂性, 因为它们必须在应用程序中明确编码 I/ O 。 我们提出了对于应用程序开发者来说至关重要的原则, 以便实现最新艺术性能, 同时将IM/ O 和 SEM 算法的记忆最小化 。 我们在Greagyti 中展示了这些数据, 一个在Flafraph 上建造的、 在 Python 中可以使用的平行的 SEM 图形库库。 在SEM 中可以实现80% 的运行, 80% 和保留 livergraphmaph 的 GRA 。