在大规模燃烧模拟中对CUDA和OpenACACC进行大规模燃烧模拟的性能评估 (Performance assessment of CUDA and OpenACC in large scale combustion simulations) - 专知论文

会员服务 ·

0

Performer · CUDA · 向量化 · Stream Processing · 缩放 ·

2021 年 7 月 24 日

Performance assessment of CUDA and OpenACC in large scale combustion simulations

翻译：在大规模燃烧模拟中对CUDA和OpenACACC进行大规模燃烧模拟的性能评估

Guillermo Oyarzun,Dani Mira,Guillaume Houzeaux

GPUs have climbed up to the top of supercomputer systems making life harder to many legacy scientific codes. Nowadays, many recipes are being used in such code's portability, without any clarity of which is the best option. We present a comparative analysis of the two most common approaches, CUDA and OpenACC, into the multi-physics CFD code Alya. Our focus is the combustion problems which are one of the most computing demanding CFD simulations. The most computing-intensive parts of the code were analyzed in detail. New data structures for the matrix assembly step have been created to facilitate a SIMD execution that benefits vectorization in the CPU and stream processing in the GPU. As a result, the CPU code has improved its performance by up to 25%. In GPU execution, CUDA has proven to be up to 2 times faster than OpenACC for the assembly of the matrix. On the contrary, similar performance has been obtained in the kernels related to vector operations used in the linear solver, where there is minimal memory reuse.

翻译：目前,许多配方都用于这种配方的可移动性,而这种配方却没有任何明确性,这是最好的选择。我们对两种最常见的方法,即CUDA和OpenACC, 进行了比较分析,以纳入多物理学的CFD代码 Alya。我们的重点是燃烧问题,这是要求CFD模拟中最需要计算机解码的最需要计算的问题之一。对代码中最需要计算密集的部分进行了详细分析。为矩阵组装步骤建立了新的数据结构,以便利SIMD执行有利于CPU和GPU流程处理中的传导化。结果,CPU代码提高了高达25%的性能。在GPU执行中,CUDA已证明比对矩阵组装的开放ACC速度快了2倍。相反,在线性求解器使用的矢量操作中,在最小的存储再利用方面,在与矢量操作有关的内仓中也取得了类似的性能。

0

相关内容

Performer

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【O'Reilly AI Conference 2019】使用GPU和Docker容器进行Horovod和Spark深度学习（Deep learning with Horovod and Spark using GPUs and Docker containers），BlueData的联合创始人兼首席架构师Thomas Phelan

【O'Reilly AI Conference 2019】使用GPU和Docker容器进行Horovod和Spark深度学习（Deep learning with Horovod and Spark using GPUs and Docker containers），BlueData的联合创始人兼首席架构师Thomas Phelan

专知会员服务

21+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

美国俄克拉荷马大学电子与计算机工程系招聘博士后

美国俄克拉荷马大学电子与计算机工程系招聘博士后

科研圈

3+阅读 · 2018年8月24日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

论文导读 | OpenCL版Caffe：高速跨平台机器学习框架

论文导读 | OpenCL版Caffe：高速跨平台机器学习框架

AI前线

4+阅读 · 2017年10月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Adaptive Clustering-based Reduced-Order Modeling Framework: Fast and accurate modeling of localized history-dependent phenomena

Arxiv

0+阅读 · 2021年9月24日

Acceleration based PSO for Multi-UAV Source-Seeking

Acceleration based PSO for Multi-UAV Source-Seeking

Arxiv

0+阅读 · 2021年9月23日

Reference-based imputation methods based on conditional mean imputation

Arxiv

0+阅读 · 2021年9月23日

Simple exponential acceleration of the power iteration algorithm

Arxiv

0+阅读 · 2021年9月22日

Code modernization strategies for short-range non-bonded molecular dynamics simulations

Arxiv

0+阅读 · 2021年9月22日

Numerical dispersion effects on the energy cascade in large-eddy simulation

Arxiv

0+阅读 · 2021年9月22日

Error bounds of fourth-order compact finite difference methods for the Dirac equation in the massless and nonrelativistic regime

Error bounds of fourth-order compact finite difference methods for the Dirac equation in the massless and nonrelativistic regime

Arxiv

0+阅读 · 2021年9月22日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Activation Maximization Generative Adversarial Nets

Arxiv

5+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

Stream Processing

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【O'Reilly AI Conference 2019】使用GPU和Docker容器进行Horovod和Spark深度学习（Deep learning with Horovod and Spark using GPUs and Docker containers），BlueData的联合创始人兼首席架构师Thomas Phelan

【O'Reilly AI Conference 2019】使用GPU和Docker容器进行Horovod和Spark深度学习（Deep learning with Horovod and Spark using GPUs and Docker containers），BlueData的联合创始人兼首席架构师Thomas Phelan

专知会员服务

21+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

美国俄克拉荷马大学电子与计算机工程系招聘博士后

美国俄克拉荷马大学电子与计算机工程系招聘博士后

科研圈

3+阅读 · 2018年8月24日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

论文导读 | OpenCL版Caffe：高速跨平台机器学习框架

论文导读 | OpenCL版Caffe：高速跨平台机器学习框架

AI前线

4+阅读 · 2017年10月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

相关论文

Adaptive Clustering-based Reduced-Order Modeling Framework: Fast and accurate modeling of localized history-dependent phenomena

Arxiv

0+阅读 · 2021年9月24日

Acceleration based PSO for Multi-UAV Source-Seeking

Acceleration based PSO for Multi-UAV Source-Seeking

Arxiv

0+阅读 · 2021年9月23日

Reference-based imputation methods based on conditional mean imputation

Arxiv

0+阅读 · 2021年9月23日

Simple exponential acceleration of the power iteration algorithm

Arxiv

0+阅读 · 2021年9月22日

Code modernization strategies for short-range non-bonded molecular dynamics simulations

Arxiv

0+阅读 · 2021年9月22日

Numerical dispersion effects on the energy cascade in large-eddy simulation

Arxiv

0+阅读 · 2021年9月22日

Error bounds of fourth-order compact finite difference methods for the Dirac equation in the massless and nonrelativistic regime

Error bounds of fourth-order compact finite difference methods for the Dirac equation in the massless and nonrelativistic regime

Arxiv

0+阅读 · 2021年9月22日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Activation Maximization Generative Adversarial Nets

Arxiv

5+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员