操纵彩票:让所有票票赢家 (Rigging the Lottery: Making All Tickets Winners) - 专知论文

会员服务 ·

0

稀疏 · Networking · 模型评估 · Neural Networks · 局部极小 ·

2021 年 7 月 23 日

Rigging the Lottery: Making All Tickets Winners

翻译：操纵彩票:让所有票票赢家

Utku Evci,Trevor Gale,Jacob Menick,Pablo Samuel Castro,Erich Elsen

from arxiv, Published in Proceedings of the 37th International Conference on Machine Learning. Code can be found in github.com/google-research/rigl

Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static. Code used in our work can be found in github.com/google-research/rigl.

翻译：由于空间或推断时间的限制,许多应用需要稀疏的神经网络。在培训密集网络以产生稀疏的推断网络方面有大量工作,但将最大的可训练的稀少模型的大小限制在最大可训练的密集模型的大小。在本文件中,我们采用了一种方法来培训稀散的神经网络,其参数计数固定,在整个培训过程中固定计算成本固定,同时不牺牲与现有密集至扭曲的培训方法相比的准确性。我们的方法是利用参数大小和不常见的梯度计算来更新培训过程中稀疏网络的地形。我们表明,与以往的技术相比,这一方法需要较少的浮点操作(浮点操作)才能达到一定的准确度。我们的工作守则可以在Githribub.com/go-graleSearch中找到。

0

相关内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

可视化特征属性基线的影响，Visualizing the Impact of Feature Attribution Baselines

可视化特征属性基线的影响，Visualizing the Impact of Feature Attribution Baselines

专知会员服务

10+阅读 · 2020年1月16日

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

专知会员服务

13+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

已删除

将门创投

13+阅读 · 2019年4月17日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

“史上最强”BigGAN公开TensorFlow Hub demo！

“史上最强”BigGAN公开TensorFlow Hub demo！

AI100

3+阅读 · 2018年11月13日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

An Orthogonal Classifier for Improving the Adversarial Robustness of Neural Networks

Arxiv

1+阅读 · 2021年9月24日

Radiation-constrained algorithms for Wireless Energy Transfer in Ad hoc Networks

Arxiv

0+阅读 · 2021年9月24日

Generative Video Transformer: Can Objects be the Words?

Arxiv

6+阅读 · 2021年7月20日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

The Knowledge Within: Methods for Data-Free Model Compression

The Knowledge Within: Methods for Data-Free Model Compression

Arxiv

3+阅读 · 2019年12月3日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Arxiv

3+阅读 · 2018年11月20日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

Comparative Study of ECO and CFNet Trackers in Noisy Environment

Arxiv

5+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

可视化特征属性基线的影响，Visualizing the Impact of Feature Attribution Baselines

可视化特征属性基线的影响，Visualizing the Impact of Feature Attribution Baselines

专知会员服务

10+阅读 · 2020年1月16日

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

专知会员服务

13+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

已删除

将门创投

13+阅读 · 2019年4月17日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

“史上最强”BigGAN公开TensorFlow Hub demo！

“史上最强”BigGAN公开TensorFlow Hub demo！

AI100

3+阅读 · 2018年11月13日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

An Orthogonal Classifier for Improving the Adversarial Robustness of Neural Networks

Arxiv

1+阅读 · 2021年9月24日

Radiation-constrained algorithms for Wireless Energy Transfer in Ad hoc Networks

Arxiv

0+阅读 · 2021年9月24日

Generative Video Transformer: Can Objects be the Words?

Arxiv

6+阅读 · 2021年7月20日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

The Knowledge Within: Methods for Data-Free Model Compression

The Knowledge Within: Methods for Data-Free Model Compression

Arxiv

3+阅读 · 2019年12月3日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Arxiv

3+阅读 · 2018年11月20日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

Comparative Study of ECO and CFNet Trackers in Noisy Environment

Arxiv

5+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员