CUDA CNN推论革命的实施 (cuConv: A CUDA Implementation of Convolution for CNN Inference) - 专知论文

会员服务 ·

0

卷积 · 卷积神经网络 · 前向传播/正向传播 · 推断 · CUDA ·

2021 年 3 月 30 日

cuConv: A CUDA Implementation of Convolution for CNN Inference

翻译：CUDA CNN推论革命的实施

Marc Jordà,Pedro Valero-Lara,Antonio J. Peña

from arxiv, This work has been submitted to the Springer for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for this purpose. State-of-the-art implementations, however, present a lack of efficiency for some commonly used network configurations. In this paper we propose a GPU-based implementation of the convolution operation for CNN inference that favors coalesced accesses, without requiring prior data transformations. Our experiments demonstrate that our proposal yields notable performance improvements in a range of common CNN forward propagation convolution configurations, with speedups of up to 2.29x with respect to the best implementation of convolution in cuDNN, hence covering a relevant region in currently existing approaches.

翻译：革命是建立在革命神经网络基础上的深层学习应用的核心操作。当前的GPU结构对于培训和部署深层CNN非常高效,因此,这些结构主要用于生产。然而,最先进的实施方式对一些常用网络配置来说缺乏效率。在本文件中,我们建议对支持联合访问的CNN推论实施基于GPU的革命操作,而无需事先进行数据转换。我们的实验表明,我们的提议在CNN的一系列通用的CNN远端传播革命配置中取得了显著的绩效改进,在最佳实施CuDNNN革命方面,速度达到2.29x,因此在目前的做法中涵盖了一个相关区域。

0

相关内容

在数学（特别是功能分析）中，卷积是对两个函数（f和g）的数学运算，产生三个函数，表示第一个函数的形状如何被另一个函数修改。卷积一词既指结果函数，又指计算结果的过程。它定义为两个函数的乘积在一个函数反转和移位后的积分。并针对所有shift值评估积分，从而生成卷积函数。

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

77+阅读 · 2020年7月23日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

机器学习研究会

12+阅读 · 2018年1月27日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Fully Convolutional Network for Removing DCT Artefacts From Images

Arxiv

0+阅读 · 2021年5月24日

Revisiting 2D Convolutional Neural Networks for Graph-based Applications

Arxiv

0+阅读 · 2021年5月23日

How Framelets Enhance Graph Neural Networks

Arxiv

21+阅读 · 2021年2月13日

Geometric Graph Convolutional Neural Networks

Geometric Graph Convolutional Neural Networks

Arxiv

10+阅读 · 2019年9月11日

AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models

Arxiv

11+阅读 · 2019年8月14日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Apple Flower Detection using Deep Convolutional Networks

Arxiv

3+阅读 · 2018年9月17日

Fast and Accurate, Convolutional Neural Network Based Approach for Object Detection from UAV

Fast and Accurate, Convolutional Neural Network Based Approach for Object Detection from UAV

Arxiv

6+阅读 · 2018年8月16日

Face Detection Using Improved Faster RCNN

Arxiv

6+阅读 · 2018年2月6日

VIP会员

文章信息

相关主题

卷积神经网络

前向传播/正向传播

相关VIP内容

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

77+阅读 · 2020年7月23日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

机器学习研究会

12+阅读 · 2018年1月27日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

相关论文

Fully Convolutional Network for Removing DCT Artefacts From Images

Arxiv

0+阅读 · 2021年5月24日

Revisiting 2D Convolutional Neural Networks for Graph-based Applications

Arxiv

0+阅读 · 2021年5月23日

How Framelets Enhance Graph Neural Networks

Arxiv

21+阅读 · 2021年2月13日

Geometric Graph Convolutional Neural Networks

Geometric Graph Convolutional Neural Networks

Arxiv

10+阅读 · 2019年9月11日

AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models

Arxiv

11+阅读 · 2019年8月14日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Apple Flower Detection using Deep Convolutional Networks

Arxiv

3+阅读 · 2018年9月17日

Fast and Accurate, Convolutional Neural Network Based Approach for Object Detection from UAV

Fast and Accurate, Convolutional Neural Network Based Approach for Object Detection from UAV

Arxiv

6+阅读 · 2018年8月16日

Face Detection Using Improved Faster RCNN

Arxiv

6+阅读 · 2018年2月6日

微信扫码咨询专知VIP会员