将商业矩阵加速加速器的隐含演进算法定性和解开神秘化 (Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators) - 专知论文

会员服务 ·

0

Performer · 卷积 · TPU · TCS · 英伟达（NVIDIA） ·

2021 年 10 月 8 日

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators

翻译：将商业矩阵加速加速器的隐含演进算法定性和解开神秘化

Yangjie Zhou,Mengtian Yang,Cong Guo,Jingwen Leng,Yun Liang,Quan Chen,Minyi Guo,Yuhao Zhu

Many of today's deep neural network accelerators, e.g., Google's TPU and NVIDIA's tensor core, are built around accelerating the general matrix multiplication (i.e., GEMM). However, supporting convolution on GEMM-based accelerators is not trivial. The naive method explicitly lowers the convolution to GEMM, commonly known as im2col, which introduces significant performance and memory overhead. Existing implicit im2col algorithms require unscalable hardware and are inefficient in supporting important convolution variants such as strided convolution. In this paper, we propose a memory-efficient and hardware-friendly implicit im2col algorithm used by Google's TPU, which dynamically converts a convolution into a GEMM with practically zero performance and memory overhead, fully unleashing the power of GEMM engines. Through comprehensive experimental results, we quantitatively argue that this algorithm has been adopted in commercial closed-source platforms, and we are the first to describe its high-level idea and implementation details. Finally, we show that our algorithm can also be generally applied to Nvidia's Tensor Cores (TC), matching and out-performing the measured performance on TCs.

翻译：当今许多深层神经网络加速器,例如谷歌的TPU 和 NVIDIA 的强力核心,是围绕加速通用矩阵乘法(即 GEMM ) 建立起来的。然而,支持以 GEMM 为基础的加速器的演化并不是微不足道的。天真的方法明显降低了向GEMM的演化,即通常称为im2col的GEMM的演化,它带来显著的性能和记忆管理。现有的隐含的 im2col 算法需要不可扩缩的硬件,并且低效地支持重要的变体,如螺旋共振等。在本文中,我们提出了谷歌的 TPU 所使用的一个记忆高效和硬件友好的隐含的 I2col 算法,它能动态地将一个革命转换成一个具有实际零性能和记忆管理器的GEMM 加速器,充分释放GEMM 引擎的能量。通过全面的实验结果,我们从数量上说,这种算法已经在商业封闭源平台中被采用,我们第一个描述其高层次的想法和执行细节。最后,我们显示我们的算法还可以普遍应用到Ndvidia-expractualate-tradals 和Cressubals。

0

相关内容

Performer

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

专知会员服务

63+阅读 · 2020年7月12日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

已删除

将门创投

7+阅读 · 2018年11月5日

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

专知

17+阅读 · 2018年6月16日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

机器学习研究会

9+阅读 · 2017年11月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

HMS-OS: Improving the Human Mental Search Optimisation Algorithm by Grouping in both Search and Objective Space

HMS-OS: Improving the Human Mental Search Optimisation Algorithm by Grouping in both Search and Objective Space

Arxiv

0+阅读 · 2021年12月3日

The Packet Number Space Debate in Multipath QUIC

Arxiv

0+阅读 · 2021年12月2日

An extra-components method for evaluating fast matrix-vector multiplication with special functions

Arxiv

0+阅读 · 2021年12月2日

CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems

Arxiv

0+阅读 · 2021年12月1日

SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures

Arxiv

0+阅读 · 2021年11月30日

Similarity and Matching of Neural Network Representations

Arxiv

10+阅读 · 2021年10月27日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Arxiv

4+阅读 · 2018年7月30日

xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems

Arxiv

6+阅读 · 2018年3月15日

VIP会员

文章信息

相关主题

英伟达（NVIDIA）

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

专知会员服务

63+阅读 · 2020年7月12日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

已删除

将门创投

7+阅读 · 2018年11月5日

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

专知

17+阅读 · 2018年6月16日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

机器学习研究会

9+阅读 · 2017年11月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

HMS-OS: Improving the Human Mental Search Optimisation Algorithm by Grouping in both Search and Objective Space

HMS-OS: Improving the Human Mental Search Optimisation Algorithm by Grouping in both Search and Objective Space

Arxiv

0+阅读 · 2021年12月3日

The Packet Number Space Debate in Multipath QUIC

Arxiv

0+阅读 · 2021年12月2日

An extra-components method for evaluating fast matrix-vector multiplication with special functions

Arxiv

0+阅读 · 2021年12月2日

CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems

Arxiv

0+阅读 · 2021年12月1日

SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures

Arxiv

0+阅读 · 2021年11月30日

Similarity and Matching of Neural Network Representations

Arxiv

10+阅读 · 2021年10月27日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Arxiv

4+阅读 · 2018年7月30日

xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems

Arxiv

6+阅读 · 2018年3月15日

微信扫码咨询专知VIP会员