TDC:通过硬件-软件软件塔分解,在GPU上建立效率极高的有线电视新闻网 (TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition) - 专知论文

会员服务 ·

0

MoDELS · 推断 · 优化器 · CNN · Performer ·

2023 年 1 月 5 日

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition

翻译：TDC:通过硬件-软件软件塔分解,在GPU上建立效率极高的有线电视新闻网

Lizhi Xiang,Miao Yin,Chengming Zhang,Aravind Sukumaran-Rajam,P. Sadayappan,Bo Yuan,Dingwen Tao

from arxiv, 14 pages, 9 figures, 3 tables, accepted by PPoPP '23

Tucker decomposition is one of the SOTA CNN model compression techniques. However, unlike the FLOPs reduction, we observe very limited inference time reduction with Tucker-compressed models using existing GPU software such as cuDNN. To this end, we propose an efficient end-to-end framework that can generate highly accurate and compact CNN models via Tucker decomposition and optimized inference code on GPUs. Specifically, we propose an ADMM-based training algorithm that can achieve highly accurate Tucker-format models. We also develop a high-performance kernel for Tucker-format convolutions and analytical performance models to guide the selection of execution parameters. We further propose a co-design framework to determine the proper Tucker ranks driven by practical inference time (rather than FLOPs). Our evaluation on five modern CNNs with A100 demonstrates that our compressed models with our optimized code achieve up to 2.21X speedup over cuDNN, 1.12X speedup over TVM, and 3.27X over the original models using cuDNN with at most 0.05% accuracy loss.

翻译：塔克分解是SOTA CNN 模式压缩技术之一。然而,与FLOPs的缩减不同,我们观察到使用现有的GPU软件(如 cuDN),塔克压缩模型的推导时间减少非常有限。为此,我们提议了一个高效的端对端框架,通过塔克分解和最佳推导码生成高度准确和紧凑的CNN模型。具体地说,我们提议一个基于ADMM的ADM培训算法,可以实现非常精确的塔克格式模型。我们还为塔克格式的变异和分析性能模型开发了一个高性能内核圈,以指导执行参数的选择。我们进一步提议了一个共同设计框架,以确定由实际推算时间(而不是FLOPs)驱动的适当的塔克级。我们用A100对5个现代CNNE的评价表明,我们优化编码的压缩模型在CUDNN、1.12X超TVM和3.27X的原始模型上达到2.21X加速度,最高精确度损失0.05%。

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

专知会员服务

49+阅读 · 2020年2月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Bcl-2选择性抑制剂ABT-199抗急性髓细胞白血病的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-TUSC7在胃癌中的抑癌作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

D-A窄带隙有机小分子半导体材料溶解性改造与光伏器件性能研究

国家自然科学基金

1+阅读 · 2013年12月31日

数字矢量地图版权保护关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向干预G蛋白偶联受体40对妊娠期糖尿病大鼠胰岛素抵抗及糖稳态的影响

国家自然科学基金

0+阅读 · 2012年12月31日

SUMO/DeSUMO化修饰在抑制性受体膜转运中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

数字图像复原大规模问题的高性能算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

数字助听器中高性能语音增强算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

On The Coherence of Quantitative Evaluation of Visual Explanations

Arxiv

1+阅读 · 2023年3月3日

Truck Axle Detection with Convolutional Neural Networks

Arxiv

0+阅读 · 2023年3月3日

Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples

Arxiv

0+阅读 · 2023年3月2日

ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients

Arxiv

1+阅读 · 2023年3月1日

Data Efficient Visual Place Recognition Using Extremely JPEG-Compressed Images

Arxiv

0+阅读 · 2023年3月1日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

ReNAS:Relativistic Evaluation of Neural Architecture Search

Arxiv

11+阅读 · 2021年3月10日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Neural Architecture Search: A Survey

Arxiv

12+阅读 · 2018年9月5日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

深度卷积神经网络的最新架构综述，A Survey of the Recent Architectures of Deep Convolutional Neural Networks

专知会员服务

49+阅读 · 2020年2月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

On The Coherence of Quantitative Evaluation of Visual Explanations

Arxiv

1+阅读 · 2023年3月3日

Truck Axle Detection with Convolutional Neural Networks

Arxiv

0+阅读 · 2023年3月3日

Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples

Arxiv

0+阅读 · 2023年3月2日

ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients

Arxiv

1+阅读 · 2023年3月1日

Data Efficient Visual Place Recognition Using Extremely JPEG-Compressed Images

Arxiv

0+阅读 · 2023年3月1日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

ReNAS:Relativistic Evaluation of Neural Architecture Search

Arxiv

11+阅读 · 2021年3月10日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Neural Architecture Search: A Survey

Arxiv

12+阅读 · 2018年9月5日

相关基金

Bcl-2选择性抑制剂ABT-199抗急性髓细胞白血病的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-TUSC7在胃癌中的抑癌作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

D-A窄带隙有机小分子半导体材料溶解性改造与光伏器件性能研究

国家自然科学基金

1+阅读 · 2013年12月31日

数字矢量地图版权保护关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向干预G蛋白偶联受体40对妊娠期糖尿病大鼠胰岛素抵抗及糖稳态的影响

国家自然科学基金

0+阅读 · 2012年12月31日

SUMO/DeSUMO化修饰在抑制性受体膜转运中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

数字图像复原大规模问题的高性能算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

数字助听器中高性能语音增强算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员