有多少训练前训练足以发现一个好的子网络? (How much pre-training is enough to discover a good subnetwork?) - 专知论文

会员服务 ·

0

剪枝 · Networking · Performer · Neural Networks · 训练误差 ·

2021 年 12 月 7 日

How much pre-training is enough to discover a good subnetwork?

翻译：有多少训练前训练足以发现一个好的子网络?

Cameron R. Wolfe,Qihan Wang,Junhyung Lyle Kim,Anastasios Kyrillidis

from arxiv, 33 pages, 5 figures

Neural network pruning is useful for discovering efficient, high-performing subnetworks within pre-trained, dense network architectures. However, more often than not, it involves a three-step process--pre-training, pruning, and re-training--that is computationally expensive, as the dense model must be fully pre-trained. Luckily, several works have empirically shown that high-performing subnetworks can be discovered via pruning without fully pre-training the dense network. Aiming to theoretically analyze the amount of dense network pre-training needed for a pruned network to perform well, we discover a theoretical bound in the number of SGD pre-training iterations on a two-layer, fully-connected network, beyond which pruning via greedy forward selection yields a subnetwork that achieves good training error. This threshold is shown to be logarithmically dependent upon the size of the dataset, meaning that experiments with larger datasets require more pre-training for subnetworks obtained via pruning to perform well. We empirically demonstrate the validity of our theoretical results across a variety of architectures and datasets, including fully-connected networks trained on MNIST and several deep convolutional neural network (CNN) architectures trained on CIFAR10 and ImageNet.

翻译：神经网络修剪对于在经过事先训练、密集的网络结构中发现高效、高性能的子网络是有用的。然而,通常情况下,它涉及三步制的预培训、修剪和再培训,其计算成本很高,因为密集模型必须完全预先培训。幸运的是,一些工程从经验上表明,高性能的子网络可以通过不经过充分预先培训的密集网络进行修剪而发现。为了从理论上分析精密网络运作良好所需的密集网络预培训数量,我们发现在两层完全连接的网络上SGD预培训迭代数的理论约束,超过三步制的预培训,通过贪婪的前期选择产生出一个达到良好培训错误的子网络。这一阈值在逻辑上取决于数据集的大小,这意味着对大数据集的实验需要更多预培训,才能使通过预选的网络运行良好。我们从经验上展示了在一系列经过培训的架构和图像网络(包括完全连接的ICFAR)上我们经过若干项理论结果的有效性。

0

相关内容

最新6篇ICLR2021篇图神经网络论文推荐

专知会员服务

57+阅读 · 2021年1月26日

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

197+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

目标检测算法优化技巧：Bag of Freebies for Training Object Detection

目标检测算法优化技巧：Bag of Freebies for Training Object Detection

极市平台

6+阅读 · 2019年3月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Predicting Out-of-Distribution Error with the Projection Norm

Arxiv

0+阅读 · 2022年2月11日

Improving Vision Transformers for Incremental Learning

Arxiv

1+阅读 · 2022年2月9日

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Arxiv

0+阅读 · 2022年2月9日

Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)

Arxiv

4+阅读 · 2021年10月1日

Architecture Disentanglement for Deep Neural Networks

Arxiv

6+阅读 · 2021年3月24日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames

Arxiv

8+阅读 · 2018年5月12日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

最新6篇ICLR2021篇图神经网络论文推荐

专知会员服务

57+阅读 · 2021年1月26日

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

197+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

目标检测算法优化技巧：Bag of Freebies for Training Object Detection

目标检测算法优化技巧：Bag of Freebies for Training Object Detection

极市平台

6+阅读 · 2019年3月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Predicting Out-of-Distribution Error with the Projection Norm

Arxiv

0+阅读 · 2022年2月11日

Improving Vision Transformers for Incremental Learning

Arxiv

1+阅读 · 2022年2月9日

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Arxiv

0+阅读 · 2022年2月9日

Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)

Arxiv

4+阅读 · 2021年10月1日

Architecture Disentanglement for Deep Neural Networks

Arxiv

6+阅读 · 2021年3月24日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames

Arxiv

8+阅读 · 2018年5月12日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员