Pascal GPU 上具有高内存效率的快速连动内核 (Fast convolution kernels on pascal GPU with high memory efficiency) - 专知论文

会员服务 ·

0

卷积 · 卷积核 · 核化 · Performer · FAST ·

2022 年 12 月 1 日

Fast convolution kernels on pascal GPU with high memory efficiency

翻译：Pascal GPU 上具有高内存效率的快速连动内核

Qiong Chang,Masaki Onishi,Tsutomu Maruyama

from arxiv, 10 pages, 5 figures

The convolution computation is widely used in many fields, especially in CNNs. Because of the rapid growth of the training data in CNNs, GPUs have been used for the acceleration, and memory-efficient algorithms are focused because of thier high performance. In this paper, we propose two convolution kernels for single-channel convolution and multi-channel convolution respectively. Our two methods achieve high performance by hiding the access delay of the global memory efficiently, and achieving high ratio of floating point Fused Multiply-Add operations per fetched data from the global memory. In comparison to the latest Cudnn library developed by Nvidia aimed to accelerate the deep-learning computation, the average performance improvement by our research is 2.6X for the single-channel, and 1.4X for the multi-channel.

翻译：由于CNN培训数据迅速增长,GPU被用于加速,而记忆高效算法也因性能高而集中。在本文中,我们分别为单频道革命和多频道革命提出了两个革命内核。我们的两个方法通过隐藏全球记忆存取的高效存取延迟和每个从全球记忆中提取的数据实现浮动点Fuse Dubly-Add操作的高比率而取得了高性能。与Nvidia为加速深层学习计算而开发的最新库登图书馆相比,我们研究的平均性能改进是单频道2.6X,多频道1.4X。

0

相关内容

在数学（特别是功能分析）中，卷积是对两个函数（f和g）的数学运算，产生三个函数，表示第一个函数的形状如何被另一个函数修改。卷积一词既指结果函数，又指计算结果的过程。它定义为两个函数的乘积在一个函数反转和移位后的积分。并针对所有shift值评估积分，从而生成卷积函数。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

有机给受体混合堆积分子晶体的电荷传输性质的理论研究

国家自然科学基金

0+阅读 · 2014年12月31日

变分法与非线性微分方程

国家自然科学基金

0+阅读 · 2014年12月31日

基于Signature的几类可靠性问题探究

国家自然科学基金

0+阅读 · 2013年12月31日

嵌入式图编程平台中公共化构件的构造方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

比式和分式规划问题的稳健解方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

天体测量中的高并行图像处理方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Energy-Inspired Self-Supervised Pretraining for Vision Models

Arxiv

0+阅读 · 2023年2月2日

A Survey on Efficient Training of Transformers

Arxiv

6+阅读 · 2023年2月2日

Graph Ordering Attention Networks

Arxiv

12+阅读 · 2022年11月21日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

相关论文

Energy-Inspired Self-Supervised Pretraining for Vision Models

Arxiv

0+阅读 · 2023年2月2日

A Survey on Efficient Training of Transformers

Arxiv

6+阅读 · 2023年2月2日

Graph Ordering Attention Networks

Arxiv

12+阅读 · 2022年11月21日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

相关基金

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

有机给受体混合堆积分子晶体的电荷传输性质的理论研究

国家自然科学基金

0+阅读 · 2014年12月31日

变分法与非线性微分方程

国家自然科学基金

0+阅读 · 2014年12月31日

基于Signature的几类可靠性问题探究

国家自然科学基金

0+阅读 · 2013年12月31日

嵌入式图编程平台中公共化构件的构造方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

比式和分式规划问题的稳健解方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

天体测量中的高并行图像处理方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员