了解和优化边缘装置的深学习、冷、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有(有(有)有(有(有)有(有)有(有(有)有(有)有(有)有(有) (Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices) - 专知论文

会员服务 ·

0

优化器 · 边缘设备 · 边 · 推断 · Learning ·

2022 年 6 月 15 日

Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices

翻译：了解和优化边缘装置的深学习、冷、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有源、有(有(有)有(有(有)有(有)有(有(有)有(有)有(有)有(有)

Rongjie Yi,Ting Cao,Ao Zhou,Xiao Ma,Shangguang Wang,Mengwei Xu

DNNs are ubiquitous on edge devices nowadays. With its increasing importance and use cases, it's not likely to pack all DNNs into device memory and expect that each inference has been warmed up. Therefore, cold inference, the process to read, initialize, and execute a DNN model, is becoming commonplace and its performance is urgently demanded to be optimized. To this end, we present NNV12, the first on-device inference engine that optimizes for cold inference NNV12 is built atop 3 novel optimization knobs: selecting a proper kernel (implementation) for each DNN operator, bypassing the weights transformation process by caching the post-transformed weights on disk, and pipelined execution of many kernels on asymmetric processors. To tackle with the huge search space, NNV12 employs a heuristic-based scheme to obtain a near-optimal kernel scheduling plan. We fully implement a prototype of NNV12 and evaluate its performance across extensive experiments. It shows that NNV12 achieves up to 15.2x and 401.5x compared to the state-of-the-art DNN engines on edge CPUs and GPUs, respectively.

翻译：目前,DNNS在边缘装置上是无处不在的。随着它越来越重要和使用案例越来越多, 它不太可能把所有DNNS都装进设备内存, 并期望每个推论都变暖了。因此, 冷推断、读、初始化和执行 DNN 模型的过程正在变得司空见惯, 并且迫切需要优化它的性能。为此, 我们提出NNNV12, 这是首个最优化冷发 NNNV12 的在离子推力机上最优化的首个NNV12 。我们完全实施了 NNV12 原型, 并评估了它在整个实验中的性能: 为每个DNNNNN操作操作操作者选择一个合适的内核( 实施), 绕过重量转换过程, 将磁盘上的后转式重量缩放在磁盘上, 并在不对称的处理器上管道中执行许多内核内核。为了应对巨大的搜索空间, NNVVA12 采用了基于超光基计划, 以获得近最佳的内核内核计划。我们实施了NV12 模型, 并评估它在广泛的实验中的性能- 和4015. 1NNPU, 。它分别在DNV12 和4015x 和GNPU- PUS 上达到15x 。

0

相关内容

优化器

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

MaPIP1;1介导香蕉响应干旱胁迫的分子调控机制解析

国家自然科学基金

0+阅读 · 2015年12月31日

逆境下ERF转录因子调控基因表达所结合的关键顺式元件及抗逆调控机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

超低能Si团簇负离子束沉积制备硅烯(silicene)

国家自然科学基金

0+阅读 · 2012年12月31日

hOGG1基因表观调控异常在非小细胞肺癌中作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Na-K-Ca盐梯度循环作用下高压实GMZ膨润土水－力学性能衰变机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

半Heusler合金型拓扑绝缘体材料的制备和物性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

On the Evaluation of User Privacy in Deep Neural Networks using Timing Side Channel

Arxiv

0+阅读 · 2022年8月3日

Digital Twin-Assisted Efficient Reinforcement Learning for Edge Task Scheduling

Arxiv

0+阅读 · 2022年8月2日

Model-based graph reinforcement learning for inductive traffic signal control

Arxiv

0+阅读 · 2022年8月1日

Collision-Aware Fast Simulation for Soft Robots by Optimization-Based Geometric Computing

Arxiv

0+阅读 · 2022年7月31日

Blockchain-enabled Server-less Federated Learning

Arxiv

0+阅读 · 2022年7月29日

Automatic Reward Design via Learning Motivation-Consistent Intrinsic Rewards

Arxiv

0+阅读 · 2022年7月29日

Federated Learning for Non-IID Data via Client Variance Reduction and Adaptive Server Update

Arxiv

0+阅读 · 2022年7月29日

Deep Reinforcement Learning for System-on-Chip: Myths and Realities

Arxiv

0+阅读 · 2022年7月29日

Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年7月28日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

35+阅读 · 2022年4月25日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

On the Evaluation of User Privacy in Deep Neural Networks using Timing Side Channel

Arxiv

0+阅读 · 2022年8月3日

Digital Twin-Assisted Efficient Reinforcement Learning for Edge Task Scheduling

Arxiv

0+阅读 · 2022年8月2日

Model-based graph reinforcement learning for inductive traffic signal control

Arxiv

0+阅读 · 2022年8月1日

Collision-Aware Fast Simulation for Soft Robots by Optimization-Based Geometric Computing

Arxiv

0+阅读 · 2022年7月31日

Blockchain-enabled Server-less Federated Learning

Arxiv

0+阅读 · 2022年7月29日

Automatic Reward Design via Learning Motivation-Consistent Intrinsic Rewards

Arxiv

0+阅读 · 2022年7月29日

Federated Learning for Non-IID Data via Client Variance Reduction and Adaptive Server Update

Arxiv

0+阅读 · 2022年7月29日

Deep Reinforcement Learning for System-on-Chip: Myths and Realities

Arxiv

0+阅读 · 2022年7月29日

Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年7月28日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

35+阅读 · 2022年4月25日

相关基金

MaPIP1;1介导香蕉响应干旱胁迫的分子调控机制解析

国家自然科学基金

0+阅读 · 2015年12月31日

逆境下ERF转录因子调控基因表达所结合的关键顺式元件及抗逆调控机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

超低能Si团簇负离子束沉积制备硅烯(silicene)

国家自然科学基金

0+阅读 · 2012年12月31日

hOGG1基因表观调控异常在非小细胞肺癌中作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Na-K-Ca盐梯度循环作用下高压实GMZ膨润土水－力学性能衰变机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

半Heusler合金型拓扑绝缘体材料的制备和物性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员