利用 HPX 运行时间对关于多硬件结构的量子蒙特卡洛应用软件进行业绩分析 (Performance Analysis of a Quantum Monte Carlo Application on Multiple Hardware Architectures Using the HPX Runtime) - 专知论文

会员服务 ·

0

Performer · 蒙特卡罗 · 可辨认的 · 中央处理器 (CPU) · 可理解性 ·

2020 年 10 月 15 日

Performance Analysis of a Quantum Monte Carlo Application on Multiple Hardware Architectures Using the HPX Runtime

翻译：利用 HPX 运行时间对关于多硬件结构的量子蒙特卡洛应用软件进行业绩分析

Weile Wei,Arghya Chatterjee,Kevin Huck,Oscar Hernandez,Hartmut Kaiser

This paper describes how we successfully used the HPX programming model to port the DCA++ application on multiple architectures that include POWER9, x86, ARM v8, and NVIDIA GPUs. We describe the lessons we can learn from this experience as well as the benefits of enabling the HPX in the application to improve the CPU threading part of the code, which led to an overall 21% improvement across architectures. We also describe how we used HPX-APEX to raise the level of abstraction to understand performance issues and to identify tasking optimization opportunities in the code, and how these relate to CPU/GPU utilization counters, device memory allocation over time, and CPU kernel-level context switches on a given architecture.

翻译：本文描述了我们如何成功地使用HPX编程模型将DCA++应用程序移植到包括 POWER9, x86, ARM v8, 和 NVIDIA GPUs在内的多个结构上。我们描述了我们可以从这一经验中汲取的教训,以及使HPX在应用中能够改进代码中CPU线部分的好处,这导致整个结构整体改善21%。我们还描述了我们如何使用HPX-APEX来提高抽象度,以了解性能问题并确定代码中的任务优化机会,以及这些与CPU/GPU利用计数器、一段时间内设备内存分配和特定结构的CPU内核级上下文开关有何关系。

0

相关内容

Performer

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

94+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

已删除

将门创投

3+阅读 · 2017年10月27日

Rapid Exploration of Optimization Strategies on Advanced Architectures using TestSNAP and LAMMPS

Arxiv

0+阅读 · 2020年11月25日

Fast Region Proposal Learning for Object Detection for Robotics

Arxiv

0+阅读 · 2020年11月25日

Faster Schrödinger-style simulation of quantum circuits

Arxiv

0+阅读 · 2020年11月24日

RanStop: A Hardware-assisted Runtime Crypto-Ransomware Detection Technique

Arxiv

0+阅读 · 2020年11月24日

On the Serverless Nature of Blockchains and Smart Contracts

Arxiv

0+阅读 · 2020年11月24日

Cost- and QoS-Efficient Serverless Cloud Computing

Arxiv

0+阅读 · 2020年11月23日

Automatic Detection and Classification of Tick-borne Skin Lesions using Deep Learning

Arxiv

0+阅读 · 2020年11月23日

Transformations of High-Level Synthesis Codes for High-Performance Computing

Arxiv

0+阅读 · 2020年11月23日

On The Round Complexity of Two-Party Quantum Computation

Arxiv

0+阅读 · 2020年11月23日

Neural Architecture Optimization

Neural Architecture Optimization

Arxiv

8+阅读 · 2018年9月5日

VIP会员

文章信息

相关主题

中央处理器 (CPU)

相关VIP内容

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

94+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

Deep Research（深度研究）：系统性综述

《革新战术战场空间能力：反无人机系统》报告

【普林斯顿博士论文】用于语音的生成式通用模型

螺旋式开发作为战略资产：美军启示

相关资讯

已删除

将门创投

3+阅读 · 2017年10月27日

相关论文

Rapid Exploration of Optimization Strategies on Advanced Architectures using TestSNAP and LAMMPS

Arxiv

0+阅读 · 2020年11月25日

Fast Region Proposal Learning for Object Detection for Robotics

Arxiv

0+阅读 · 2020年11月25日

Faster Schrödinger-style simulation of quantum circuits

Arxiv

0+阅读 · 2020年11月24日

RanStop: A Hardware-assisted Runtime Crypto-Ransomware Detection Technique

Arxiv

0+阅读 · 2020年11月24日

On the Serverless Nature of Blockchains and Smart Contracts

Arxiv

0+阅读 · 2020年11月24日

Cost- and QoS-Efficient Serverless Cloud Computing

Arxiv

0+阅读 · 2020年11月23日

Automatic Detection and Classification of Tick-borne Skin Lesions using Deep Learning

Arxiv

0+阅读 · 2020年11月23日

Transformations of High-Level Synthesis Codes for High-Performance Computing

Arxiv

0+阅读 · 2020年11月23日

On The Round Complexity of Two-Party Quantum Computation

Arxiv

0+阅读 · 2020年11月23日

Neural Architecture Optimization

Neural Architecture Optimization

Arxiv

8+阅读 · 2018年9月5日

微信扫码咨询专知VIP会员