GPU-加速达斯克应用高效的基于MPI的通信 (Efficient MPI-based Communication for GPU-Accelerated Dask Applications) - 专知论文

会员服务 ·

0

Dask · Performer · 簇 · Spark · 英伟达（NVIDIA） ·

2021 年 1 月 21 日

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

翻译：GPU-加速达斯克应用高效的基于MPI的通信

Aamir Shafi,Jahanzeb Maqbool Hashmi,Hari Subramoni,Dhabaleswar K., Panda

from arxiv, 10 pages, 9 figures, 1 table

Dask is a popular parallel and distributed computing framework, which rivals Apache Spark to enable task-based scalable processing of big data. The Dask Distributed library forms the basis of this computing engine and provides support for adding new communication devices. It currently has two communication devices: one for TCP and the other for high-speed networks using UCX-Py -- a Cython wrapper to UCX. This paper presents the design and implementation of a new communication backend for Dask -- called MPI4Dask -- that is targeted for modern HPC clusters built with GPUs. MPI4Dask exploits mpi4py over MVAPICH2-GDR, which is a GPU-aware implementation of the Message Passing Interface (MPI) standard. MPI4Dask provides point-to-point asynchronous I/O communication coroutines, which are non-blocking concurrent operations defined using the async/await keywords from the Python's asyncio framework. Our latency and throughput comparisons suggest that MPI4Dask outperforms UCX by 6x for 1 Byte message and 4x for large messages (2 MBytes and beyond) respectively. We also conduct comparative performance evaluation of MPI4Dask with UCX using two benchmark applications: 1) sum of cuPy array with its transpose, and 2) cuDF merge. MPI4Dask speeds up the overall execution time of the two applications by an average of 3.47x and 3.11x respectively on an in-house cluster built with NVIDIA Tesla V100 GPUs for 1-6 Dask workers. We also perform scalability analysis of MPI4Dask against UCX for these applications on TACC's Frontera (GPU) system with upto 32 Dask workers on 32 NVIDIA Quadro RTX 5000 GPUs and 256 CPU cores. MPI4Dask speeds up the execution time for cuPy and cuDF applications by an average of 1.71x and 2.91x respectively for 1-32 Dask workers on the Frontera (GPU) system.

翻译：达斯克是一个流行的平行且分布式计算框架, 它与 Apache 的 Park 相对应, 以基于任务且可扩缩的海量数据处理为对象。 Dask 分布式图书馆构成此计算引擎的基础, 并为添加新的通信设备提供支持。目前它有两个通信设备: 一个是TCP, 另一个是使用UCX- Py的高速网络, 一个是使用UCX- Py( Cython Plus) 向 UCX 提供同步 I/ O 的通信包。本文展示了达斯克的新通信后端的设计和实施, 称为 MPI4- STAx, 目标是与 GVP4- D 的现代 HPC 集群组合, 并且通过 MIDS. 4x 运行, IMI4x 运行一个大规模的业绩分析。

0

相关内容

Dask

Dask是一个并行计算库，能在集群中进行分布式计算，能以一种更方便简洁的方式处理大数据量，与Spark这些大数据处理框架相比较，Dask更轻。Dask更侧重与其他框架，如：Numpy，Pandas，Scikit-learning相结合，从而使其能更加方便进行分布式并行计算。

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【图机器学习论文】图神经网络：方法与应用综述（Graph Neural Networks: A Review of Methods and Applications）

【图机器学习论文】图神经网络：方法与应用综述（Graph Neural Networks: A Review of Methods and Applications）

专知会员服务

140+阅读 · 2019年12月16日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

专知会员服务

10+阅读 · 2019年10月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

移动端机器学习资源合集

移动端机器学习资源合集

专知

8+阅读 · 2019年4月21日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

TensorFlow GPU基准测试：2080 Ti vs V100 vs 1080 Ti vs Titan V

TensorFlow GPU基准测试：2080 Ti vs V100 vs 1080 Ti vs Titan V

论智

12+阅读 · 2018年10月14日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

Accelerating Radiation Therapy Dose Calculation with Nvidia GPUs

Arxiv

0+阅读 · 2021年3月17日

ARXON: A Framework for Approximate Communication over Photonic Networks-on-Chip

Arxiv

0+阅读 · 2021年3月16日

COVID-19 Real-Time Tracker and Analytical Study

Arxiv

0+阅读 · 2021年3月16日

Fast Antenna and Beam Switching Method for mmWave Handsets with Hand Blockage

Arxiv

0+阅读 · 2021年3月15日

Delay-aware and Energy-Efficient Computation Offloading in Mobile Edge Computing Using Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年3月13日

Closed testing procedures for treatment-versus-control comparisons and multiple correlated endpoint

Arxiv

0+阅读 · 2021年3月13日

gIM: GPU Accelerated RIS-based Influence Maximization Algorithm

Arxiv

0+阅读 · 2021年3月12日

Large Batch Simulation for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年3月12日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

Arxiv

3+阅读 · 2018年7月5日

VIP会员

文章信息

相关主题

英伟达（NVIDIA）

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【图机器学习论文】图神经网络：方法与应用综述（Graph Neural Networks: A Review of Methods and Applications）

【图机器学习论文】图神经网络：方法与应用综述（Graph Neural Networks: A Review of Methods and Applications）

专知会员服务

140+阅读 · 2019年12月16日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

专知会员服务

10+阅读 · 2019年10月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】迈向具有高维结果的可靠且稳健的因果推断

《美海军分布式海上作战（DMO）概念：最新情况》

Gemini 2.5：推动前沿，具备先进推理、多模态、长上下文及下一代智能体能力

【ICML2025教程】联想记忆的现代方法

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

移动端机器学习资源合集

移动端机器学习资源合集

专知

8+阅读 · 2019年4月21日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

TensorFlow GPU基准测试：2080 Ti vs V100 vs 1080 Ti vs Titan V

TensorFlow GPU基准测试：2080 Ti vs V100 vs 1080 Ti vs Titan V

论智

12+阅读 · 2018年10月14日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

Accelerating Radiation Therapy Dose Calculation with Nvidia GPUs

Arxiv

0+阅读 · 2021年3月17日

ARXON: A Framework for Approximate Communication over Photonic Networks-on-Chip

Arxiv

0+阅读 · 2021年3月16日

COVID-19 Real-Time Tracker and Analytical Study

Arxiv

0+阅读 · 2021年3月16日

Fast Antenna and Beam Switching Method for mmWave Handsets with Hand Blockage

Arxiv

0+阅读 · 2021年3月15日

Delay-aware and Energy-Efficient Computation Offloading in Mobile Edge Computing Using Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年3月13日

Closed testing procedures for treatment-versus-control comparisons and multiple correlated endpoint

Arxiv

0+阅读 · 2021年3月13日

gIM: GPU Accelerated RIS-based Influence Maximization Algorithm

Arxiv

0+阅读 · 2021年3月12日

Large Batch Simulation for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年3月12日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

Arxiv

3+阅读 · 2018年7月5日

微信扫码咨询专知VIP会员