This paper presents the design and implementation of a new open-source view-based graph analytics system called Graphsurge. Graphsurge is designed to support applications that analyze multiple snapshots or views of a large-scale graph. Users program Graphsurge through a declarative graph view definition language (GVDL) to create views over input graphs and a Differential Dataflow-based programming API to write analytics computations. A key feature of GVDL is the ability to organize views into view collections, which allows Graphsurge to automatically share computation across views, without users writing any incrementalization code, by performing computations differentially. We then introduce two optimization problems that naturally arise in our setting. First is the collection ordering problem to determine the order of views that leads to minimum differences across consecutive views. We prove this problem is NP-hard and show a constant-factor approximation algorithm drawn from literature. Second is the collection splitting problem to decide on which views to run computations differentially vs from scratch, for which we present an adaptive solution that makes decisions at runtime. We present extensive experiments to demonstrate the benefits of running computations differentially for view collections and our collection ordering and splitting optimizations.
翻译:本文展示了一个新的开放源源基于图像的图形分析系统的设计和执行。 图形搜索旨在支持用于分析大型图形的多重快照或视图的应用程序。 用户程序图形通过声明图形视图定义语言( GVDL) 生成对输入图形和不同数据流编程 API 的视图, 以撰写分析计算。 GVDL 的一个关键特征是能够将观点组织成视图集, 使图形突顯能够通过进行不同计算, 使用户不撰写任何递增代码, 从而自动分享各种观点的计算。 我们随后引入了两个自然在设置中产生的优化问题 。 首先, 收集需要确定观点的顺序, 从而导致连续观点之间的最小差异 。 我们证明了这个问题是硬的, 并展示了从文献中提取的恒定点近似算算法。 其次, 要决定哪些观点从零到零进行不同计算, 我们为此提出一个适应性解决方案, 我们在运行一个运行优化解决方案, 以显示在运行差异采集和图像采集的收益。