CFD-DEF解答器MFiX-Exa、GPU和CPU的性能比较 (Performance comparison of CFD-DEM solver MFiX-Exa, on GPUs and CPUs) - 专知论文

会员服务 ·

0

Performer · 中央处理器 (CPU) · GPU · 时间步 · 序列化 ·

2021 年 8 月 19 日

Performance comparison of CFD-DEM solver MFiX-Exa, on GPUs and CPUs

翻译：CFD-DEF解答器MFiX-Exa、GPU和CPU的性能比较

Shandong Lao,Aaron Holt,Deepthi Vaidhynathan,Hariswaran Sitaraman,Christine M. Hrenya,Thomas Hauser

We present computational performance comparisons of gas-solid simulations performed on current CPU and GPU architectures using MFiX Exa, a CFD-DEM solver that leverages hybrid CPU+GPU parallelism. A representative fluidized bed simulation with varying particle numbers from 2 to 67 million is used to compare serial and parallel performance. A single GPU was observed to be about 10 times faster compared to a single CPU core. The use of 3 GPUs on a single compute node was observed to be 4x faster than using all 64 CPU cores. We also observed that using an error controlled adaptive time stepping scheme for particle advance provided a consistent 4x speed-up on both CPUs and GPUs. Weak scaling results indicate superior parallel efficiencies when using GPUs compared to CPUs for the problem sizes studied in this work.

翻译：我们用利用混合 CPU+GPU的CFD-DEM解析器MFIX Exa,对当前CPU和GPU结构上进行的气体-固体模拟进行计算性能比较。使用具有代表性的流化床模拟,其粒子数从200万至6 700万不等,用于比较序列和平行性能。观察到单个GPU比单个CPU核心要快10倍。在单个计算式计算式节点上使用3个GPU的速度比使用所有64个CPU核心要快4倍。我们还注意到,对粒子推进使用控制错误的适应性时间间隔方案,为CPU和GPU提供了一致的4x加速速度。微缩结果显示,在使用GPU与CPU相比,对于这项工作所研究的问题大小而言,使用GPU比CPU的效率更高。

0

相关内容

Performer

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

109+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

98+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

3倍加速CPU上的BERT模型部署

3倍加速CPU上的BERT模型部署

ApacheMXNet

11+阅读 · 2020年7月13日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

已删除

将门创投

4+阅读 · 2018年11月6日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

Metrics and Design of an Instruction Roofline Model for AMD GPUs

Metrics and Design of an Instruction Roofline Model for AMD GPUs

Arxiv

0+阅读 · 2021年10月15日

Performance Analysis of a MIMO System with Bursty Traffic in the presence of Energy Harvesting Jammer

Arxiv

0+阅读 · 2021年10月14日

Auto-Tuned Preconditioners for the Spectral Element Method on GPUs

Arxiv

0+阅读 · 2021年10月14日

Scalable Graph Embedding LearningOn A Single GPU

Arxiv

0+阅读 · 2021年10月13日

SpliceOut: A Simple and Efficient Audio Augmentation Method

SpliceOut: A Simple and Efficient Audio Augmentation Method

Arxiv

0+阅读 · 2021年10月13日

Modelling, Fitting, and Prediction with Non-Gaussian Spatial and Spatio-Temporal Data using FRK

Arxiv

0+阅读 · 2021年10月13日

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

Arxiv

7+阅读 · 2021年4月17日

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

Arxiv

7+阅读 · 2020年12月15日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Deep Comparison: Relation Columns for Few-Shot Learning

Deep Comparison: Relation Columns for Few-Shot Learning

Arxiv

3+阅读 · 2018年11月20日

VIP会员

文章信息

相关主题

中央处理器 (CPU)

相关VIP内容

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

109+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

98+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

3倍加速CPU上的BERT模型部署

3倍加速CPU上的BERT模型部署

ApacheMXNet

11+阅读 · 2020年7月13日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

已删除

将门创投

4+阅读 · 2018年11月6日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

相关论文

Metrics and Design of an Instruction Roofline Model for AMD GPUs

Metrics and Design of an Instruction Roofline Model for AMD GPUs

Arxiv

0+阅读 · 2021年10月15日

Performance Analysis of a MIMO System with Bursty Traffic in the presence of Energy Harvesting Jammer

Arxiv

0+阅读 · 2021年10月14日

Auto-Tuned Preconditioners for the Spectral Element Method on GPUs

Arxiv

0+阅读 · 2021年10月14日

Scalable Graph Embedding LearningOn A Single GPU

Arxiv

0+阅读 · 2021年10月13日

SpliceOut: A Simple and Efficient Audio Augmentation Method

SpliceOut: A Simple and Efficient Audio Augmentation Method

Arxiv

0+阅读 · 2021年10月13日

Modelling, Fitting, and Prediction with Non-Gaussian Spatial and Spatio-Temporal Data using FRK

Arxiv

0+阅读 · 2021年10月13日

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table

Arxiv

7+阅读 · 2021年4月17日

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

Arxiv

7+阅读 · 2020年12月15日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Deep Comparison: Relation Columns for Few-Shot Learning

Deep Comparison: Relation Columns for Few-Shot Learning

Arxiv

3+阅读 · 2018年11月20日

微信扫码咨询专知VIP会员