perf4sight: 用于模拟有线电视新闻网关于边缘GPUs的培训业绩的工具流 (perf4sight: A toolflow to model CNN training performance on Edge GPUs) - 专知论文

会员服务 ·

0

边 · Networking · Performer · 卷积神经网络 · MoDELS ·

2021 年 8 月 12 日

perf4sight: A toolflow to model CNN training performance on Edge GPUs

翻译：perf4sight: 用于模拟有线电视新闻网关于边缘GPUs的培训业绩的工具流

Aditya Rajagopal,Christos-Savvas Bouganis

from arxiv, Accepted into the Workshop on Embedded and Real-World Computer Vision in Autonomous Driving (ERCVAD), ICCV 2021

The increased memory and processing capabilities of today's edge devices create opportunities for greater edge intelligence. In the domain of vision, the ability to adapt a Convolutional Neural Network's (CNN) structure and parameters to the input data distribution leads to systems with lower memory footprint, latency and power consumption. However, due to the limited compute resources and memory budget on edge devices, it is necessary for the system to be able to predict the latency and memory footprint of the training process in order to identify favourable training configurations of the network topology and device combination for efficient network adaptation. This work proposes perf4sight, an automated methodology for developing accurate models that predict CNN training memory footprint and latency given a target device and network. This enables rapid identification of network topologies that can be retrained on the edge device with low resource consumption. With PyTorch as the framework and NVIDIA Jetson TX2 as the target device, the developed models predict training memory footprint and latency with 95% and 91% accuracy respectively for a wide range of networks, opening the path towards efficient network adaptation on edge GPUs.

翻译：今天边缘装置的记忆和处理能力增加,为获得更大的边缘智能创造了机会。在视觉领域,使进化神经网络的结构和参数适应输入数据分布的能力导致记忆足迹、延缓力和能量消耗较少的系统。然而,由于边端装置的计算资源和记忆预算有限,系统必须能够预测培训过程的内存和记忆足迹,以便确定网络地形和装置组合的有利培训配置,从而有效地改造网络。这项工作提出了perf4sight,一种用于开发准确模型的自动方法,用于预测有目标的设备和网络的CNN培训记忆足迹和延缓力。这样可以快速识别网络的表层,在边缘装置上可以以低资源消耗量重新训练。以PyTorrch作为框架和NVIDIA Jetson TX2作为目标装置,开发模型预测培训记忆足迹和耐久性,对广泛的网络分别达到95%和91%的精度,从而打开在边缘GPUPS上高效网络适应的路径。

0

相关内容

tf_geometric — 基于TensorFlow的友好高效的图神经网络（GNN）库

tf_geometric — 基于TensorFlow的友好高效的图神经网络（GNN）库

专知会员服务

26+阅读 · 2021年8月9日

分布式深度学习训练网络综述

专知会员服务

48+阅读 · 2021年2月2日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

【边缘智能综述论文】A Survey on Edge Intelligence

【边缘智能综述论文】A Survey on Edge Intelligence

专知会员服务

123+阅读 · 2020年3月30日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

微软研究院新版书籍《数据科学基础》（Foundations of Data Science），附479页PDF下载

微软研究院新版书籍《数据科学基础》（Foundations of Data Science），附479页PDF下载

专知会员服务

136+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

哇~这么Deep且又轻量的Network，实时目标检测

哇~这么Deep且又轻量的Network，实时目标检测

计算机视觉战队

7+阅读 · 2018年8月15日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

算法优化｜梯度下降和随机梯度下降 — 从0开始

算法优化｜梯度下降和随机梯度下降 — 从0开始

全球人工智能

8+阅读 · 2017年12月25日

深度学习入门篇--手把手教你用 TensorFlow 训练模型

深度学习入门篇--手把手教你用 TensorFlow 训练模型

全球人工智能

4+阅读 · 2017年10月21日

推荐｜深度学习PyTorch的教程代码

推荐｜深度学习PyTorch的教程代码

全球人工智能

10+阅读 · 2017年10月8日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

Arxiv

0+阅读 · 2021年10月11日

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

Arxiv

0+阅读 · 2021年10月9日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

SiamVGG: Visual Tracking using Deeper Siamese Networks

SiamVGG: Visual Tracking using Deeper Siamese Networks

Arxiv

5+阅读 · 2019年3月3日

Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device

Arxiv

3+阅读 · 2018年10月16日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

Arxiv

3+阅读 · 2018年7月5日

Towards Efficient Dynamic Virtual Network Embedding Strategy for Cloud IoT Networks

Arxiv

4+阅读 · 2018年1月30日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

VIP会员

文章信息

相关主题

卷积神经网络

相关VIP内容

tf_geometric — 基于TensorFlow的友好高效的图神经网络（GNN）库

tf_geometric — 基于TensorFlow的友好高效的图神经网络（GNN）库

专知会员服务

26+阅读 · 2021年8月9日

分布式深度学习训练网络综述

专知会员服务

48+阅读 · 2021年2月2日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

【边缘智能综述论文】A Survey on Edge Intelligence

【边缘智能综述论文】A Survey on Edge Intelligence

专知会员服务

123+阅读 · 2020年3月30日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

微软研究院新版书籍《数据科学基础》（Foundations of Data Science），附479页PDF下载

微软研究院新版书籍《数据科学基础》（Foundations of Data Science），附479页PDF下载

专知会员服务

136+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

哇~这么Deep且又轻量的Network，实时目标检测

哇~这么Deep且又轻量的Network，实时目标检测

计算机视觉战队

7+阅读 · 2018年8月15日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

算法优化｜梯度下降和随机梯度下降 — 从0开始

算法优化｜梯度下降和随机梯度下降 — 从0开始

全球人工智能

8+阅读 · 2017年12月25日

深度学习入门篇--手把手教你用 TensorFlow 训练模型

深度学习入门篇--手把手教你用 TensorFlow 训练模型

全球人工智能

4+阅读 · 2017年10月21日

推荐｜深度学习PyTorch的教程代码

推荐｜深度学习PyTorch的教程代码

全球人工智能

10+阅读 · 2017年10月8日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

Arxiv

0+阅读 · 2021年10月11日

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

Arxiv

0+阅读 · 2021年10月9日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

SiamVGG: Visual Tracking using Deeper Siamese Networks

SiamVGG: Visual Tracking using Deeper Siamese Networks

Arxiv

5+阅读 · 2019年3月3日

Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device

Arxiv

3+阅读 · 2018年10月16日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

Arxiv

3+阅读 · 2018年7月5日

Towards Efficient Dynamic Virtual Network Embedding Strategy for Cloud IoT Networks

Arxiv

4+阅读 · 2018年1月30日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

微信扫码咨询专知VIP会员