变形器网络:将图层与可缩放和有效的深层学习参数脱钩 (Shapeshifter Networks: Decoupling Layers from Parameters for Scalable and Effective Deep Learning) - 专知论文

会员服务 ·

0

Performer · 层 · MoDELS · Networking · Neural Networks ·

2020 年 12 月 2 日

Shapeshifter Networks: Decoupling Layers from Parameters for Scalable and Effective Deep Learning

翻译：变形器网络:将图层与可缩放和有效的深层学习参数脱钩

Bryan A. Plummer,Nikoli Dryden,Julius Frost,Torsten Hoefler,Kate Saenko

Fitting a model into GPU memory during training is an increasing concern as models continue to grow. To address this issue, we present Shapeshifter Networks (SSNs), a flexible neural network framework that decouples layers from model weights, enabling us to implement any neural network with an arbitrary number of parameters. In SSNs each layer obtains weights from a parameter store that decides where and how to allocate parameters to layers. This can result in sharing parameters across layers even when they have different sizes or perform different operations. SSNs do not require any modifications to a model's loss function or architecture, making them easy to use. Our approach can create parameter efficient networks by using a relatively small number of weights, or can improve a model's performance by adding additional model capacity during training without affecting the computational resources required at test time. We evaluate SSNs using seven network architectures across diverse tasks that include image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1% of the parameters.

翻译：随着模型的不断增长,将模型应用于培训期间的 GPU 记忆是一个日益令人关切的问题。为了解决这个问题,我们提出了形状变换者网络(SSNS),这是一个灵活的神经网络框架,它能将层与模型重量脱钩,使我们能够执行任何带有任意参数数的神经网络。在SNS中,每个层从一个参数存储处获得加权,该存储处决定参数如何分配到各个层次。这可能导致各层间共享参数,即使它们大小不同或执行不同的操作。SSNSN不需要修改模型的损失函数或结构,使其容易使用。我们的方法可以通过使用相对少量的重量来创建参数高效的网络,或者通过在培训期间增加模型能力来改进模型的性能,而不影响测试时所需的计算资源。我们用七个网络结构来评估SSNS,这包括图像分类、双向图像-感应力检索和定位等不同任务,即使只使用1%的参数,我们也能够创建高性能模型。

0

相关内容

Performer

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

专知会员服务

174+阅读 · 2020年5月1日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Residual or Gate? Towards Deeper Graph Neural Networks for Inductive Graph Representation Learning

Residual or Gate? Towards Deeper Graph Neural Networks for Inductive Graph Representation Learning

Arxiv

3+阅读 · 2019年8月26日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Character-Level Feature Extraction with Densely Connected Networks

Character-Level Feature Extraction with Densely Connected Networks

Arxiv

5+阅读 · 2018年7月26日

Learning Dynamic Memory Networks for Object Tracking

Arxiv

9+阅读 · 2018年3月20日

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Arxiv

8+阅读 · 2018年2月7日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

专知会员服务

174+阅读 · 2020年5月1日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

《美陆军近战整合企业现代化计划（2025—2026）》最新报告

以色列-伊朗空战：短暂而激烈冲突的启示

《动态作战支援演习框架构建》80页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Residual or Gate? Towards Deeper Graph Neural Networks for Inductive Graph Representation Learning

Residual or Gate? Towards Deeper Graph Neural Networks for Inductive Graph Representation Learning

Arxiv

3+阅读 · 2019年8月26日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

Character-Level Feature Extraction with Densely Connected Networks

Character-Level Feature Extraction with Densely Connected Networks

Arxiv

5+阅读 · 2018年7月26日

Learning Dynamic Memory Networks for Object Tracking

Arxiv

9+阅读 · 2018年3月20日

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Arxiv

8+阅读 · 2018年2月7日

微信扫码咨询专知VIP会员