动力残余神经神经网络 (Momentum Residual Neural Networks) - 专知论文

会员服务 ·

0

动量 · ResNet · Neural Networks · Networking · 求逆 ·

2021 年 5 月 13 日

Momentum Residual Neural Networks

翻译：动力残余神经神经网络

Michael E. Sander,Pierre Ablin,Mathieu Blondel,Gabriel Peyré

from arxiv, 24 pages

The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network. A way to circumvent this issue is to use reversible architectures. In this paper, we propose to change the forward rule of a ResNet by adding a momentum term. The resulting networks, momentum residual neural networks (Momentum ResNets), are invertible. Unlike previous invertible architectures, they can be used as a drop-in replacement for any existing ResNet block. We show that Momentum ResNets can be interpreted in the infinitesimal step size regime as second-order ordinary differential equations (ODEs) and exactly characterize how adding momentum progressively increases the representation capabilities of Momentum ResNets. Our analysis reveals that Momentum ResNets can learn any linear mapping up to a multiplicative factor, while ResNets cannot. In a learning to optimize setting, where convergence to a fixed point is required, we show theoretically and empirically that our method succeeds while existing invertible architectures fail. We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.

翻译：深残余神经网络(ResNets)的后向反向分析培训的记忆成本随着网络深度的深度而直线增长。绕过这一问题的一个方法就是使用可逆结构。在本文中, 我们提议通过增加一个动力术语来改变ResNet的前瞻性规则。由此形成的网络、动力残留神经网络( Momentum ResNets)是不可倒置的。与以前不可倒置的结构不同, 它们可以用作任何现有的ResNet块的倒置替代物。我们显示, Momentum ResNets可以在无限的步进规模制度中被解释为二级普通差异方程式(ODEs), 并准确地描述如何增加动力以渐进方式提高Momentum ResNets的代表能力。我们的分析显示, Momentum ResNetsNet可以学习任何直线图, 升至多复制性因素, 而ResNets则无法。在学习优化设置时, 需要与固定点的趋同点, 我们从理论上和实证地表明, 我们的方法在现有的可倒置结构失败时会成功。我们展示了CFAR和图像网络的模型,, 我们展示了更精确的模型显示, Resum ResNet 显示, ResNet resmarmentalmentalmentalment 显示, shalmentalmentalmentalmentalmentalments

7

相关内容

动量方法 (Polyak, 1964) 旨在加速学习，特别是处理高曲率、小但一致的梯度，或是带噪声的梯度。动量算法积累了之前梯度指数级衰减的移动平均，并且继续沿该方向移动。

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【CVPR2021】神经网络中的知识演化

【CVPR2021】神经网络中的知识演化

专知会员服务

25+阅读 · 2021年3月11日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

综述：DenseNet—Dense卷积网络（图像分类）

综述：DenseNet—Dense卷积网络（图像分类）

专知

86+阅读 · 2018年11月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Sparsifying Neural Network Connections for Face Recognition

Sparsifying Neural Network Connections for Face Recognition

统计学习与视觉计算组

7+阅读 · 2017年6月10日

Provable Convergence of Nesterov Accelerated Method for Over-Parameterized Neural Networks

Arxiv

0+阅读 · 2021年7月5日

Autoencoder based Randomized Learning of Feedforward Neural Networks for Regression

Arxiv

0+阅读 · 2021年7月4日

The persistent cup-length invariant

Arxiv

0+阅读 · 2021年7月4日

Feature Purification: How Adversarial Training Performs Robust Deep Learning

Arxiv

0+阅读 · 2021年7月3日

How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer

Arxiv

0+阅读 · 2021年7月2日

Differentiable Sparsification for Deep Neural Networks

Arxiv

0+阅读 · 2021年7月1日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Residual Non-local Attention Networks for Image Restoration

Arxiv

9+阅读 · 2019年3月24日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【CVPR2021】神经网络中的知识演化

【CVPR2021】神经网络中的知识演化

专知会员服务

25+阅读 · 2021年3月11日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

综述：DenseNet—Dense卷积网络（图像分类）

综述：DenseNet—Dense卷积网络（图像分类）

专知

86+阅读 · 2018年11月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Sparsifying Neural Network Connections for Face Recognition

Sparsifying Neural Network Connections for Face Recognition

统计学习与视觉计算组

7+阅读 · 2017年6月10日

相关论文

Provable Convergence of Nesterov Accelerated Method for Over-Parameterized Neural Networks

Arxiv

0+阅读 · 2021年7月5日

Autoencoder based Randomized Learning of Feedforward Neural Networks for Regression

Arxiv

0+阅读 · 2021年7月4日

The persistent cup-length invariant

Arxiv

0+阅读 · 2021年7月4日

Feature Purification: How Adversarial Training Performs Robust Deep Learning

Arxiv

0+阅读 · 2021年7月3日

How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer

Arxiv

0+阅读 · 2021年7月2日

Differentiable Sparsification for Deep Neural Networks

Arxiv

0+阅读 · 2021年7月1日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Residual Non-local Attention Networks for Image Restoration

Arxiv

9+阅读 · 2019年3月24日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

微信扫码咨询专知VIP会员