评价高级别方案拟订模式的绩效和可携带性:Julia、Python/Numba和关于地缘节点的Kokkos</s> (Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes) - 专知论文

会员服务 ·

0

Performer · MoDELS · CUDA · 可辨认的 · 英伟达（NVIDIA） ·

2023 年 3 月 10 日

Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes

翻译：评价高级别方案拟订模式的绩效和可携带性:Julia、Python/Numba和关于地缘节点的Kokkos

William F. Godoy,Pedro Valero-Lara,T. Elise Dettling,Christian Trefftz,Ian Jorquera,Thomas Sheehy,Ross G. Miller,Marc Gonzalez-Tallada,Jeffrey S. Vetter,Valentin Churavy

from arxiv, Accepted at the 28th HIPS workshop, held in conjunction with IPDPS 2023. 10 pages, 9 figures

We explore the performance and portability of the high-level programming models: the LLVM-based Julia and Python/Numba, and Kokkos on high-performance computing (HPC) nodes: AMD Epyc CPUs and MI250X graphical processing units (GPUs) on Frontier's test bed Crusher system and Ampere's Arm-based CPUs and NVIDIA's A100 GPUs on the Wombat system at the Oak Ridge Leadership Computing Facilities. We compare the default performance of a hand-rolled dense matrix multiplication algorithm on CPUs against vendor-compiled C/OpenMP implementations, and on each GPU against CUDA and HIP. Rather than focusing on the kernel optimization per-se, we select this naive approach to resemble exploratory work in science and as a lower-bound for performance to isolate the effect of each programming model. Julia and Kokkos perform comparably with C/OpenMP on CPUs, while Julia implementations are competitive with CUDA and HIP on GPUs. Performance gaps are identified on NVIDIA A100 GPUs for Julia's single precision and Kokkos, and for Python/Numba in all scenarios. We also comment on half-precision support, productivity, performance portability metrics, and platform readiness. We expect to contribute to the understanding and direction for high-level, high-productivity languages in HPC as the first-generation exascale systems are deployed.

翻译：我们探索高水平编程模式的性能和可携带性:基于LLVM的Julia和Python/Numba的高性能计算节点的LLVM Julia和Python/Numba,以及Kokkos的高性能计算(HPC)节点:AMD Epyc CPUs和MI250X图形处理器(GPUs)的性能和可移动性:Frontier的测试床压碎器系统和Ampere的Amper CPUs和NVIDIA的A100GPUs系统。我们比较了CUDA和GOMP的手动密集矩阵倍增速计算器的默认性性能和倍增速算法的多重性能。我们选择这种天真的方法来模仿科学探索性工作,作为孤立每个编程模型效果的低度标准。朱丽亚和科在CPUDA和GPIPS上与CUPS的高度性能理解性能,我们在HA/PIPIP/半级平台上,我们对HIA/PA的预期性平级的成绩理解,我们对HIA/PLVA/PS-S-S-S-S-S-S-S-PS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-Siralental-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-PL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-</s>

0

相关内容

Performer

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Alpha稳定分布环境下的非圆信号波达方向估计方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

高稳定性纳米复合多层膜钨基块材的制备和抗辐照性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

不锈钢表面“膨胀”α相层形成与强韧化机理

国家自然科学基金

0+阅读 · 2013年12月31日

声矢量传感器阵列在非理想传输条件下的声源定位研究

国家自然科学基金

2+阅读 · 2013年12月31日

镍基单晶高温合金PA EB-PVD γ/γ'涂层微观组织结构和抗高温氧化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hg2CuTi型全Heusler合金表面与界面的半金属特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

自形成纳米多层膜的微观结构及磁性耦合机理

国家自然科学基金

0+阅读 · 2012年12月31日

（类）钙钛矿结构氧化物纳米纤维的高温电化学性能

国家自然科学基金

0+阅读 · 2012年12月31日

强八元数矩阵代数与矢量传感器阵列多维信号处理

国家自然科学基金

0+阅读 · 2011年12月31日

多重刺激响应的纤维素基接枝共聚物结构与响应性

国家自然科学基金

0+阅读 · 2011年12月31日

Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering

Arxiv

0+阅读 · 2023年5月3日

Prediction of Performance and Power Consumption of GPGPU Applications

Arxiv

0+阅读 · 2023年5月3日

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

Arxiv

0+阅读 · 2023年5月2日

Design and Evaluation of a Bioinspired Tendon-Driven 3D-Printed Robotic Eye with Active Vision Capabilities

Arxiv

0+阅读 · 2023年5月1日

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

Arxiv

0+阅读 · 2023年5月1日

Fast evaluation of spherical harmonics with sphericart

Arxiv

0+阅读 · 2023年4月30日

Quantum Control Machine: The Limits of Quantum Programs as Data

Arxiv

0+阅读 · 2023年4月28日

Training and Evaluation of a Multilingual Tokenizer for GPT-SW3

Arxiv

0+阅读 · 2023年4月28日

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

Arxiv

16+阅读 · 2023年2月9日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

VIP会员

文章信息

相关主题

英伟达（NVIDIA）

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering

Arxiv

0+阅读 · 2023年5月3日

Prediction of Performance and Power Consumption of GPGPU Applications

Arxiv

0+阅读 · 2023年5月3日

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

Arxiv

0+阅读 · 2023年5月2日

Design and Evaluation of a Bioinspired Tendon-Driven 3D-Printed Robotic Eye with Active Vision Capabilities

Arxiv

0+阅读 · 2023年5月1日

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

Arxiv

0+阅读 · 2023年5月1日

Fast evaluation of spherical harmonics with sphericart

Arxiv

0+阅读 · 2023年4月30日

Quantum Control Machine: The Limits of Quantum Programs as Data

Arxiv

0+阅读 · 2023年4月28日

Training and Evaluation of a Multilingual Tokenizer for GPT-SW3

Arxiv

0+阅读 · 2023年4月28日

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

Arxiv

16+阅读 · 2023年2月9日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

相关基金

Alpha稳定分布环境下的非圆信号波达方向估计方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

高稳定性纳米复合多层膜钨基块材的制备和抗辐照性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

不锈钢表面“膨胀”α相层形成与强韧化机理

国家自然科学基金

0+阅读 · 2013年12月31日

声矢量传感器阵列在非理想传输条件下的声源定位研究

国家自然科学基金

2+阅读 · 2013年12月31日

镍基单晶高温合金PA EB-PVD γ/γ'涂层微观组织结构和抗高温氧化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Hg2CuTi型全Heusler合金表面与界面的半金属特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

自形成纳米多层膜的微观结构及磁性耦合机理

国家自然科学基金

0+阅读 · 2012年12月31日

（类）钙钛矿结构氧化物纳米纤维的高温电化学性能

国家自然科学基金

0+阅读 · 2012年12月31日

强八元数矩阵代数与矢量传感器阵列多维信号处理

国家自然科学基金

0+阅读 · 2011年12月31日

多重刺激响应的纤维素基接枝共聚物结构与响应性

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员