In modern data science problems, techniques for extracting value from big data require performing large-scale optimization over heterogenous, irregularly structured data. Much of this data is best represented as multi-relational graphs, making vertex programming abstractions such as those of Pregel and GraphLab ideal fits for modern large-scale data analysis. In this paper, we describe a vertex-programming implementation of a popular consensus optimization technique known as the alternating direction of multipliers (ADMM). ADMM consensus optimization allows elegant solution of complex objectives such as inference in rich probabilistic models. We also introduce a novel hypergraph partitioning technique that improves over state-of-the-art partitioning techniques for vertex programming and significantly reduces the communication cost by reducing the number of replicated nodes up to an order of magnitude. We implemented our algorithm in GraphLab and measure scaling performance on a variety of realistic bipartite graph distributions and a large synthetic voter-opinion analysis application. In our experiments, we are able to achieve a 50% improvement in runtime over the current state-of-the-art GraphLab partitioning scheme.
翻译:在现代数据科学问题中,从大数据中提取价值的技术要求对异质、非正常结构化的数据进行大规模优化。这些数据中,大部分数据最能体现为多关系图,使脊椎编程抽取,如Pregel和GreaphLab理想模型,适合现代大规模数据分析。在本文中,我们描述了流行的共识优化技术的顶点编程应用,称为乘数交替方向(ADMMM)。ADMM共识优化允许优雅地解决复杂目标,如富富饶概率模型的推断。我们还采用了一种新的高射分流技术,通过将复制的节点数量减少到一定规模,改进了顶端编程技术,大大减少了通信成本。我们在GregLab中应用了我们的算法,并测量了各种现实的双部分图分布和大规模合成选民-视觉分析应用的绩效。在我们的实验中,我们能够在目前状态的图形-图表分区分割计划期间实现50%的改进。