TEMPI:一个内插的MPI图书馆,具有CUDA观测数据型的直观代表性 (TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes) - 专知论文

会员服务 ·

0

Performer · 正则的 · GPU · 可交换的 · Continuity ·

2021 年 1 月 21 日

TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes

翻译：TEMPI:一个内插的MPI图书馆,具有CUDA观测数据型的直观代表性

Carl Pearson,Kun Wu,I-Hsin Chung,Jinjun Xiong,Wen-Mei Hwu

MPI derived datatypes are an abstraction that simplifies handling of non-contiguous data in MPI applications. These datatypes are recursively constructed at runtime from primitive Named Types defined in the MPI standard. More recently, the development and deployment of CUDA-aware MPI implementations has encouraged the transition of distributed high-performance MPI codes to use GPUs. Such implementations allow MPI functions to directly operate on GPU buffers, easing integration of GPU compute into MPI codes. Despite substantial attention to CUDA-aware MPI implementations, they continue to offer cripplingly poor GPU performance when manipulating derived datatypes on GPUs. This work presents a new MPI library, TEMPI, to address this issue. TEMPI first introduces a common datatype to represent equivalent MPI derived datatypes. TEMPI can be used as an interposed library on existing MPI deployments without system or application changes. Furthermore, this work presents a performance model of GPU derived datatype handling, demonstrating that previously preferred "one-shot" methods are not always fastest. Ultimately, the interposed-library model of this work demonstrates MPI_Pack speedup of up to 242,000x and MPI_Send speedup of up to 59,000x compared to the MPI implementation deployed on a leadership-class supercomputer. This yields speedup of more than 1000x in a 3D halo exchange at 192 ranks.

翻译：MPI 衍生数据类型是一个抽象的抽象, 它简化了MPI 应用程序中非连接数据的处理。这些数据类型在运行时由 MPI 标准定义的原始命名类型在运行时根据原始命名类型进行递归构建。最近, CUDA- 觉察到 MPI 的实施鼓励了分布式高性能 MPI 代码的转换, 以使用 GPU 。这种执行允许 MPI 函数在 GPU 缓冲上直接操作, 简化 GPU 的整合到 MPI 代码中。尽管非常关注 CUDA- 觉悟性 MPI 执行, 但它们在操作 GPUPS 的衍生数据类型时, 继续提供非常差的 GPUPU 性能。这项工作为一个新的 MPI 库, TEMPI 的开发和部署速度2 000 。 TEMPI 首次引入一个共同的数据类型, 以代表等量 MPI 数据类型。此外, 将 GPO 生成的数据类型处理模式显示 GUUD- liver listal- listal- lipplemental- slopplemental 5- suplementlemental 和 MAx MAx 等 MAx 等速度方法, 。

0

相关内容

Performer

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

使用 Keras Tuner 调节超参数

使用 Keras Tuner 调节超参数

TensorFlow

15+阅读 · 2020年2月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】xLearn：一款专门针对大规模稀疏数据的机器学习库

【推荐】xLearn：一款专门针对大规模稀疏数据的机器学习库

机器学习研究会

3+阅读 · 2017年11月25日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

机器学习研究会

9+阅读 · 2017年10月24日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

An experience-based recommendation system to support migrations of Android applications from Java to Kotlin

An experience-based recommendation system to support migrations of Android applications from Java to Kotlin

Arxiv

1+阅读 · 2021年3月17日

BOPI: A Programming Interface For Reuse Of Research Data Available On DSpace Repositories

Arxiv

0+阅读 · 2021年3月17日

A Framework for Generative and Contrastive Learning of Audio Representations

Arxiv

0+阅读 · 2021年3月16日

Increased Complexity and Fitness of Artificial Cells that Reproduce Using Spatially Distributed Asynchronous Parallel Processes

Arxiv

0+阅读 · 2021年3月15日

Efficient Construction of Functional Representations for Quantum Algorithms

Arxiv

0+阅读 · 2021年3月15日

Use of static surrogates in hyperparameter optimization

Arxiv

0+阅读 · 2021年3月14日

Fast Graph Representation Learning with PyTorch Geometric

Arxiv

5+阅读 · 2019年3月7日

Efficient end-to-end learning for quantizable representations

Arxiv

4+阅读 · 2018年6月12日

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Arxiv

4+阅读 · 2018年5月24日

Image Segmentation Using Subspace Representation and Sparse Decomposition

Arxiv

6+阅读 · 2018年4月6日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

使用 Keras Tuner 调节超参数

使用 Keras Tuner 调节超参数

TensorFlow

15+阅读 · 2020年2月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】xLearn：一款专门针对大规模稀疏数据的机器学习库

【推荐】xLearn：一款专门针对大规模稀疏数据的机器学习库

机器学习研究会

3+阅读 · 2017年11月25日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测

机器学习研究会

9+阅读 · 2017年10月24日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

相关论文

An experience-based recommendation system to support migrations of Android applications from Java to Kotlin

An experience-based recommendation system to support migrations of Android applications from Java to Kotlin

Arxiv

1+阅读 · 2021年3月17日

BOPI: A Programming Interface For Reuse Of Research Data Available On DSpace Repositories

Arxiv

0+阅读 · 2021年3月17日

A Framework for Generative and Contrastive Learning of Audio Representations

Arxiv

0+阅读 · 2021年3月16日

Increased Complexity and Fitness of Artificial Cells that Reproduce Using Spatially Distributed Asynchronous Parallel Processes

Arxiv

0+阅读 · 2021年3月15日

Efficient Construction of Functional Representations for Quantum Algorithms

Arxiv

0+阅读 · 2021年3月15日

Use of static surrogates in hyperparameter optimization

Arxiv

0+阅读 · 2021年3月14日

Fast Graph Representation Learning with PyTorch Geometric

Arxiv

5+阅读 · 2019年3月7日

Efficient end-to-end learning for quantizable representations

Arxiv

4+阅读 · 2018年6月12日

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Arxiv

4+阅读 · 2018年5月24日

Image Segmentation Using Subspace Representation and Sparse Decomposition

Arxiv

6+阅读 · 2018年4月6日

微信扫码咨询专知VIP会员