分布式图处理中基于EASE的分区选择优化 (Partitioner Selection with EASE to Optimize Distributed Graph Processing) - 专知论文

会员服务 ·

0

图处理 · EASE · 算法 · 群集 · 随机选择 ·

2023 年 4 月 11 日

Partitioner Selection with EASE to Optimize Distributed Graph Processing

翻译：分布式图处理中基于EASE的分区选择优化

Nikolai Merkel,Ruben Mayer,Tawkir Ahmed Fakir,Hans-Arno Jacobsen

from arxiv, To appear at IEEE International Conference on Data Engineering (ICDE 2023)

For distributed graph processing on massive graphs, a graph is partitioned into multiple equally-sized parts which are distributed among machines in a compute cluster. In the last decade, many partitioning algorithms have been developed which differ from each other with respect to the partitioning quality, the run-time of the partitioning and the type of graph for which they work best. The plethora of graph partitioning algorithms makes it a challenging task to select a partitioner for a given scenario. Different studies exist that provide qualitative insights into the characteristics of graph partitioning algorithms that support a selection. However, in order to enable automatic selection, a quantitative prediction of the partitioning quality, the partitioning run-time and the run-time of subsequent graph processing jobs is needed. In this paper, we propose a machine learning-based approach to provide such a quantitative prediction for different types of edge partitioning algorithms and graph processing workloads. We show that training based on generated graphs achieves high accuracy, which can be further improved when using real-world data. Based on the predictions, the automatic selection reduces the end-to-end run-time on average by 11.1% compared to a random selection, by 17.4% compared to selecting the partitioner that yields the lowest cut size, and by 29.1% compared to the worst strategy, respectively. Furthermore, in 35.7% of the cases, the best strategy was selected.

翻译：对于海量图中的分布式图处理，一个图被划分成多个等大小的部分，并分配到计算群集中的机器上。在过去的十年中，已经开发出了许多不同的分区算法，这些算法在分区质量、分区运行时间和最适合的图类型方面不同。各种图分区算法的丰富性使其成为一个具有挑战性的任务，以选择给定情况下的分区器。存在不同的研究可以提供关于支持选择图分区算法特征的定性见解。然而，在启用自动选择之前，需要对分区质量、分区运行时间和后续图处理作业的运行时间进行定量预测。在本文中，我们提出了一种基于机器学习的方法，为不同类型的边分区算法和图处理工作负载提供这种量化预测。我们展示了基于生成的图的训练可以获得很高的准确度，当使用真实数据时，可以进一步提高准确度。基于这些预测，自动选择平均减少了端到端运行时间，相对于随机选择，平均减少了11.1%，相对于选择切割大小最小的分区器，平均减少了17.4%，相对于最差的策略，平均减少了29.1%。此外，在35.7%的情况下选择的是最佳策略。

0

相关内容

图处理

【硬核书】稀疏多项式优化:理论与实践，220页pdf

【硬核书】稀疏多项式优化:理论与实践，220页pdf

专知会员服务

71+阅读 · 2022年9月30日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

大型知识图谱检索算法的优化，19页pdf，Optimization of Retrieval Algorithms on Large Scale Knowledge Graphs

大型知识图谱检索算法的优化，19页pdf，Optimization of Retrieval Algorithms on Large Scale Knowledge Graphs

专知会员服务

45+阅读 · 2020年2月14日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

用深度强化学习求解组合优化（路径、调度）问题

用深度强化学习求解组合优化（路径、调度）问题

PaperWeekly

4+阅读 · 2022年10月18日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

面向10Tb/in2级磁存储系统的二维LDPC码设计

国家自然科学基金

0+阅读 · 2015年12月31日

通讯避免的若干数学库核心算法的设计和优化

国家自然科学基金

0+阅读 · 2014年12月31日

基于MEMS技术的光栅结合可动Fabry-Perot腔微型光谱仪研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于集群OFDM的低功耗电力线通信收发端设计

国家自然科学基金

0+阅读 · 2013年12月31日

多核系统中基于新型存储器工艺的高能效缓存设计研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的基于距离的拓扑指标及若干相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

基于植物生理生态过程-遥感耦合模型的长江口湿地植被生产力动态研究

国家自然科学基金

0+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

网络公用存储服务的系统结构和数据组织研究

国家自然科学基金

0+阅读 · 2011年12月31日

港口物流系统复杂离散调度异常检测及调度失效控制策略

国家自然科学基金

0+阅读 · 2009年12月31日

Labeling Chaos to Learning Harmony: Federated Learning with Noisy Labels

Arxiv

0+阅读 · 2023年5月26日

Sliding Window Sum Algorithms for Deep Neural Networks

Arxiv

0+阅读 · 2023年5月25日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Arxiv

23+阅读 · 2021年9月29日

Graph Learning: A Survey

Arxiv

58+阅读 · 2021年5月3日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

A Comprehensive Survey on Graph Neural Networks

A Comprehensive Survey on Graph Neural Networks

Arxiv

13+阅读 · 2019年3月10日

VIP会员

文章信息

相关主题

相关VIP内容

【硬核书】稀疏多项式优化:理论与实践，220页pdf

【硬核书】稀疏多项式优化:理论与实践，220页pdf

专知会员服务

71+阅读 · 2022年9月30日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

大型知识图谱检索算法的优化，19页pdf，Optimization of Retrieval Algorithms on Large Scale Knowledge Graphs

大型知识图谱检索算法的优化，19页pdf，Optimization of Retrieval Algorithms on Large Scale Knowledge Graphs

专知会员服务

45+阅读 · 2020年2月14日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

用深度强化学习求解组合优化（路径、调度）问题

用深度强化学习求解组合优化（路径、调度）问题

PaperWeekly

4+阅读 · 2022年10月18日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Labeling Chaos to Learning Harmony: Federated Learning with Noisy Labels

Arxiv

0+阅读 · 2023年5月26日

Sliding Window Sum Algorithms for Deep Neural Networks

Arxiv

0+阅读 · 2023年5月25日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Arxiv

23+阅读 · 2021年9月29日

Graph Learning: A Survey

Arxiv

58+阅读 · 2021年5月3日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

A Comprehensive Survey on Graph Neural Networks

A Comprehensive Survey on Graph Neural Networks

Arxiv

13+阅读 · 2019年3月10日

相关基金

面向10Tb/in2级磁存储系统的二维LDPC码设计

国家自然科学基金

0+阅读 · 2015年12月31日

通讯避免的若干数学库核心算法的设计和优化

国家自然科学基金

0+阅读 · 2014年12月31日

基于MEMS技术的光栅结合可动Fabry-Perot腔微型光谱仪研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于集群OFDM的低功耗电力线通信收发端设计

国家自然科学基金

0+阅读 · 2013年12月31日

多核系统中基于新型存储器工艺的高能效缓存设计研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的基于距离的拓扑指标及若干相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

基于植物生理生态过程-遥感耦合模型的长江口湿地植被生产力动态研究

国家自然科学基金

0+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

网络公用存储服务的系统结构和数据组织研究

国家自然科学基金

0+阅读 · 2011年12月31日

港口物流系统复杂离散调度异常检测及调度失效控制策略

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员