TResNet: 高性能 GPU 专用建筑 (TResNet: High Performance GPU-Dedicated Architecture) - 专知论文

会员服务 ·

0

模型评估 · Better · Performer · MoDELS · GPU ·

2020 年 3 月 30 日

TResNet: High Performance GPU-Dedicated Architecture

翻译：TResNet: 高性能 GPU 专用建筑

Tal Ridnik,Hussam Lawen,Asaf Noy,Itamar Friedman

from arxiv, 9 pages, 5 figures

Many deep learning models, developed in recent years, reach higher ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count. While FLOPs are often seen as a proxy for network efficiency, when measuring actual GPU training and inference throughput, vanilla ResNet50 is usually significantly faster than its recent competitors, offering better throughput-accuracy trade-off. In this work, we introduce a series of architecture modifications that aim to boost neural networks' accuracy, while retaining their GPU training and inference efficiency. We first demonstrate and discuss the bottlenecks induced by FLOPs-optimizations. We then suggest alternative designs that better utilize GPU structure and assets. Finally, we introduce a new family of GPU-dedicated models, called TResNet, which achieve better accuracy and efficiency than previous ConvNets. Using a TResNet model, with similar GPU throughput to ResNet50, we reach 80.7% top-1 accuracy on ImageNet. Our TResNet models also transfer well and achieve state-of-the-art accuracy on competitive datasets such as Stanford cars (96.0%), CIFAR-10 (99.0%), CIFAR-100 (91.5%) and Oxford-Flowers (99.1%). Implementation is available at: https://github.com/mrT23/TResNet

翻译：近年来开发的很多深层次学习模型比ResNet50的图像网络精确度要高一些,而FLOPS的数值较低或可比较。虽然FLOPs通常被视为网络效率的替代物,但在测量实际的GPU培训和推断输送量时,Vanilla ResNet50通常比其最近的竞争对手快得多,提供了更好的吞吐-准确性交易。在这项工作中,我们引入了一系列旨在提升神经网络准确性、同时保留其GPU培训和推断效率的架构修改。我们首先展示和讨论FLOPs-优化造成的瓶颈。我们然后提出更好的利用GPU结构和资产的替代设计。最后,我们引入了一个新的GPU专用模型系列,称为TRSNet,比以前的ConvNet的准确性和效率更高。我们使用一个类似GPUTT50的模型,我们达到了图像网络上80.7%的顶级-1级精确度。我们的TResNet模型也传输良好,并实现了竞争性数据设置上的最新精确度,例如斯坦福-FAR汽车(99%)、CAR-10 % SA-FAR-ODAR-M(99%)、CIAF-10ers(99%)、CIAR-10ers(99%)。

8

相关内容

模型评估

机器学习系统设计系统评估标准

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

专知会员服务

22+阅读 · 2020年4月11日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

【论文】生成式教学网络:通过学习生成合成训练数据来加速神经结构搜索（Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data）

【论文】生成式教学网络:通过学习生成合成训练数据来加速神经结构搜索（Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data）

专知会员服务

14+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

EfficientDet: Scalable and Efficient Object Detection

EfficientDet: Scalable and Efficient Object Detection

Arxiv

6+阅读 · 2019年11月20日

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Arxiv

3+阅读 · 2019年5月28日

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Arxiv

7+阅读 · 2019年4月16日

Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks

Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks

Arxiv

3+阅读 · 2018年11月20日

Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device

Arxiv

3+阅读 · 2018年10月16日

Neural Architecture Optimization

Neural Architecture Optimization

Arxiv

8+阅读 · 2018年9月5日

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Arxiv

4+阅读 · 2018年7月30日

The GAN Landscape: Losses, Architectures, Regularization, and Normalization

Arxiv

3+阅读 · 2018年7月12日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Arxiv

8+阅读 · 2017年11月22日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

【CVPR2020-Facebook AI】扩展架构的高效视频识别，X3D: Expanding Architectures

专知会员服务

22+阅读 · 2020年4月11日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

【论文】生成式教学网络:通过学习生成合成训练数据来加速神经结构搜索（Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data）

【论文】生成式教学网络:通过学习生成合成训练数据来加速神经结构搜索（Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data）

专知会员服务

14+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

EfficientDet: Scalable and Efficient Object Detection

EfficientDet: Scalable and Efficient Object Detection

Arxiv

6+阅读 · 2019年11月20日

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Arxiv

3+阅读 · 2019年5月28日

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Arxiv

7+阅读 · 2019年4月16日

Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks

Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks

Arxiv

3+阅读 · 2018年11月20日

Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device

Arxiv

3+阅读 · 2018年10月16日

Neural Architecture Optimization

Neural Architecture Optimization

Arxiv

8+阅读 · 2018年9月5日

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Arxiv

4+阅读 · 2018年7月30日

The GAN Landscape: Losses, Architectures, Regularization, and Normalization

Arxiv

3+阅读 · 2018年7月12日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Arxiv

8+阅读 · 2017年11月22日

微信扫码咨询专知VIP会员