从基于任务的 GPU 工作聚合到 Stellar 合并: 将精细的 CPU 任务转换为便携式 GPU 内核</s> (From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels) - 专知论文

会员服务 ·

0

Performer · GPU · 核化 · 中央处理器 (CPU) · 饱和 ·

2023 年 3 月 4 日

From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels

翻译：从基于任务的 GPU 工作聚合到 Stellar 合并: 将精细的 CPU 任务转换为便携式 GPU 内核

Gregor Daiß,Patrick Diehl,Dominic Marcello,Alireza Kheirkhahan,Hartmut Kaiser,Dirk Pflüger

Meeting both scalability and performance portability requirements is a challenge for any HPC application, especially for adaptively refined ones. In Octo-Tiger, an astrophysics application for the simulation of stellar mergers, we approach this with existing solutions: We employ HPX to obtain fine-grained tasks to easily distribute work and finely overlap communication and computation. For the computations themselves, we use Kokkos to turn these tasks into compute kernels capable of running on hardware ranging from a few CPU cores to powerful accelerators. There is a missing link, however: while the fine-grained parallelism exposed by HPX is useful for scalability, it can hinder GPU performance when the tasks become too small to saturate the device, causing low resource utilization. To bridge this gap, we investigate multiple different GPU work aggregation strategies within Octo-Tiger, adding one new strategy, and evaluate the node-level performance impact on recent AMD and NVIDIA GPUs, achieving noticeable speedups.

翻译：满足可扩缩性和性能可移植性要求对于任何高常委会应用都是一种挑战,特别是对于适应性改进型的高常委会应用而言。在模拟星型合并的天体物理学应用Octo-Tiger中,我们用现有解决方案来处理这个问题:我们使用HPX来获得细微的细微任务,以便容易地分配工作,并细微地重叠通信和计算。对于计算本身,我们利用Kokkos将这些任务转化为能够从几个CPU核心到强大的加速器等硬件运行的计算内核。然而,有一个缺失的环节:虽然HPX暴露的细微的平行法有助于可扩缩性,但是当任务变得太小而不能饱和,造成资源利用率低时,它会妨碍GPU的性能。为了缩小这一差距,我们调查了奥克-泰热内部的多个不同的GPU工作汇总战略,添加了一个新的战略,并评估对最近的AMD和NVIDIA GPU的节级业绩影响,从而实现明显的速度。</s>

0

相关内容

Performer

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

深度学习与NLP

45+阅读 · 2019年10月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

批处理机上的分组工件排序研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

面向气动CFD非线性求解的GPU/CPU混合并行JFNK算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

钙调磷酸酶B类蛋白GmCBL1在大豆低温胁迫应答中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

高速重载齿轮系统非线性耦合噪声预估与动力优化

国家自然科学基金

0+阅读 · 2011年12月31日

非晶合金不等厚壳体零件热态微冲锻成形工艺与机理

国家自然科学基金

0+阅读 · 2011年12月31日

斯钙素1对β28096;粉样蛋白诱导的人脑微血管内皮细胞凋亡及单核细胞跨内皮迁移的影响及机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

改进的Unscented卡尔曼滤波与电池组SOC快速精确估计

国家自然科学基金

0+阅读 · 2008年12月31日

Blockchain-based Federated Learning with Secure Aggregation in Trusted Execution Environment for Internet-of-Things

Arxiv

0+阅读 · 2023年4月25日

Retinal Vessel Segmentation via a Multi-resolution Contextual Network and Adversarial Learning

Arxiv

0+阅读 · 2023年4月25日

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

Arxiv

0+阅读 · 2023年4月25日

On the Performance of a Novel Class of Linear System Solvers and Comparison with State-of-The-Art Algorithms

Arxiv

0+阅读 · 2023年4月24日

An Analysis of Collocation on GPUs for Deep Learning Training

Arxiv

0+阅读 · 2023年4月24日

IDLL: Inverse Depth Line based Visual Localization in Challenging Environments

Arxiv

1+阅读 · 2023年4月23日

TGNN: A Joint Semi-supervised Framework for Graph-level Classification

Arxiv

0+阅读 · 2023年4月23日

Using Alternation Direction Method of Multipliers to Enhance robots Calibration Accuracy based on Multi-Planal Constraints

Arxiv

0+阅读 · 2023年4月23日

SAT Requires Exhaustive Search

Arxiv

0+阅读 · 2023年4月22日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

VIP会员

文章信息

相关主题

中央处理器 (CPU)

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

深度学习与NLP

45+阅读 · 2019年10月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Blockchain-based Federated Learning with Secure Aggregation in Trusted Execution Environment for Internet-of-Things

Arxiv

0+阅读 · 2023年4月25日

Retinal Vessel Segmentation via a Multi-resolution Contextual Network and Adversarial Learning

Arxiv

0+阅读 · 2023年4月25日

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

Arxiv

0+阅读 · 2023年4月25日

On the Performance of a Novel Class of Linear System Solvers and Comparison with State-of-The-Art Algorithms

Arxiv

0+阅读 · 2023年4月24日

An Analysis of Collocation on GPUs for Deep Learning Training

Arxiv

0+阅读 · 2023年4月24日

IDLL: Inverse Depth Line based Visual Localization in Challenging Environments

Arxiv

1+阅读 · 2023年4月23日

TGNN: A Joint Semi-supervised Framework for Graph-level Classification

Arxiv

0+阅读 · 2023年4月23日

Using Alternation Direction Method of Multipliers to Enhance robots Calibration Accuracy based on Multi-Planal Constraints

Arxiv

0+阅读 · 2023年4月23日

SAT Requires Exhaustive Search

Arxiv

0+阅读 · 2023年4月22日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

相关基金

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

批处理机上的分组工件排序研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

面向气动CFD非线性求解的GPU/CPU混合并行JFNK算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

钙调磷酸酶B类蛋白GmCBL1在大豆低温胁迫应答中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

高速重载齿轮系统非线性耦合噪声预估与动力优化

国家自然科学基金

0+阅读 · 2011年12月31日

非晶合金不等厚壳体零件热态微冲锻成形工艺与机理

国家自然科学基金

0+阅读 · 2011年12月31日

斯钙素1对β28096;粉样蛋白诱导的人脑微血管内皮细胞凋亡及单核细胞跨内皮迁移的影响及机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

改进的Unscented卡尔曼滤波与电池组SOC快速精确估计

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员