用于循环内存内存的 GPU 应用的持久性内核内核 (Persistent Kernels for Iterative Memory-bound GPU Applications) - 专知论文

会员服务 ·

0

核化 · 时间步 · GPU · 环 · 可约的 ·

2022 年 5 月 21 日

Persistent Kernels for Iterative Memory-bound GPU Applications

翻译：用于循环内存内存的 GPU 应用的持久性内核内核

Lingqi Zhang,Mohamed Wahib,Peng Chen,Jintao Meng,Xiao Wang,Satoshi Matsuoka

Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts as the barrier required after advancing the solution every time step. We propose a scheme for running memory-bound iterative GPU kernels: PERsistent KernelS (PERKS). In this scheme the time loop is moved inside a persistent kernel, and device-wide barriers are used for synchronization. We then reduce the traffic to device memory by caching a subset of the output in each time step in registers and shared memory to be used as input for the following time step. PERKS can be generalized to any iterative solver: they are largely independent of the solver's implementation. We explain the design principle of PERKS and demonstrate the effectiveness of PERKS for a wide range of iterative 2D/3D stencil benchmarks (geometric mean speedup of $2.29$x in small domains and $1.53$x in large domains), and a Krylov subspace solver (geometric mean speedup of $4.67$x in smaller SpMV datasets from SuiteSparse and $1.39$x in larger SpMV datasets, for conjugate gradient).

翻译：普通的 GPU 执行在主机侧有一个循环, 以尽可能长的时间/ 等步骤来引用 GPU 内核。每个内核的终止暗含在每次推进解决方案之后所需的屏障中。我们提议一个运行内存的迭代 GPU 内核的系统方案: perpsistent KernelS (PERKS) 。在这个方案中, 时间环移动在一个持久性内核中, 并且使用全设备屏障来同步。然后, 我们通过在登记册和共享内存的每个时间步骤中累积一部分输出, 以用作下一个时间步骤的输入, 从而减少对设备内存的流量。 PERKS 的终止是任何迭代式的屏障。我们解释 PERKS 的设计原则, 并展示 PERKS 在一系列的迭代 2D/3 D 电离心基准中的有效性( 平均速度在小域中为2.29美元, 在大域中为1.53美元, 在大域中为1.537美元) 的SBLISSplex 亚空间数据。

0

相关内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

基于编译的PCM内存损耗均衡方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

功能性核磁共振电阻抗断层成像中的反问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

NM1/FM/NM2结构中自旋泵-逆自旋霍尔效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

自旋轨道耦合超冷费米原子气体

国家自然科学基金

0+阅读 · 2012年12月31日

岩体地下结构地震响应多尺度分析的基础理论和方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Lower Bounds for the MMSE via Neural Network Estimation and Their Applications to Privacy

Arxiv

0+阅读 · 2022年7月10日

When Does Differentially Private Learning Not Suffer in High Dimensions?

Arxiv

0+阅读 · 2022年7月9日

Generative Adversarial Networks and Other Generative Models

Arxiv

0+阅读 · 2022年7月8日

A novel adversarial learning strategy for medical image classification

Arxiv

0+阅读 · 2022年7月7日

A Case-Study on Variations Observed in Accelerometers Across Devices

A Case-Study on Variations Observed in Accelerometers Across Devices

Arxiv

0+阅读 · 2022年7月7日

A Solver + Gradient Descent Training Algorithm for Deep Neural Networks

A Solver + Gradient Descent Training Algorithm for Deep Neural Networks

Arxiv

0+阅读 · 2022年7月7日

Backpropagation on Dynamical Networks

Arxiv

0+阅读 · 2022年7月7日

Deep energy method in topology optimization applications

Deep energy method in topology optimization applications

Arxiv

0+阅读 · 2022年7月7日

DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks

Arxiv

0+阅读 · 2022年7月7日

The Union of Manifolds Hypothesis and its Implications for Deep Generative Modelling

Arxiv

0+阅读 · 2022年7月6日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

【CVPR2020】用于图像超分辨率的深度展开网络，Deep Unfolding Network for Image Super-Resolution

专知会员服务

44+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Lower Bounds for the MMSE via Neural Network Estimation and Their Applications to Privacy

Arxiv

0+阅读 · 2022年7月10日

When Does Differentially Private Learning Not Suffer in High Dimensions?

Arxiv

0+阅读 · 2022年7月9日

Generative Adversarial Networks and Other Generative Models

Arxiv

0+阅读 · 2022年7月8日

A novel adversarial learning strategy for medical image classification

Arxiv

0+阅读 · 2022年7月7日

A Case-Study on Variations Observed in Accelerometers Across Devices

A Case-Study on Variations Observed in Accelerometers Across Devices

Arxiv

0+阅读 · 2022年7月7日

A Solver + Gradient Descent Training Algorithm for Deep Neural Networks

A Solver + Gradient Descent Training Algorithm for Deep Neural Networks

Arxiv

0+阅读 · 2022年7月7日

Backpropagation on Dynamical Networks

Arxiv

0+阅读 · 2022年7月7日

Deep energy method in topology optimization applications

Deep energy method in topology optimization applications

Arxiv

0+阅读 · 2022年7月7日

DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks

Arxiv

0+阅读 · 2022年7月7日

The Union of Manifolds Hypothesis and its Implications for Deep Generative Modelling

Arxiv

0+阅读 · 2022年7月6日

相关基金

基于编译的PCM内存损耗均衡方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

功能性核磁共振电阻抗断层成像中的反问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

NM1/FM/NM2结构中自旋泵-逆自旋霍尔效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

自旋轨道耦合超冷费米原子气体

国家自然科学基金

0+阅读 · 2012年12月31日

岩体地下结构地震响应多尺度分析的基础理论和方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员