动态光度测量: 神经网络的缺失成份 (Dynamical Isometry: The Missing Ingredient for Neural Network Pruning) - 专知论文

会员服务 ·

0

剪枝 · Neural Networks · Performer · Networking · 可理解性 ·

2021 年 5 月 12 日

Dynamical Isometry: The Missing Ingredient for Neural Network Pruning

翻译：动态光度测量: 神经网络的缺失成份

Huan Wang,Can Qin,Yue Bai,Yun Fu

from arxiv, 8 pages, 2 figures, 7 tables

Several recent works [40, 24] observed an interesting phenomenon in neural network pruning: A larger finetuning learning rate can improve the final performance significantly. Unfortunately, the reason behind it remains elusive up to date. This paper is meant to explain it through the lens of dynamical isometry [42]. Specifically, we examine neural network pruning from an unusual perspective: pruning as initialization for finetuning, and ask whether the inherited weights serve as a good initialization for the finetuning? The insights from dynamical isometry suggest a negative answer. Despite its critical role, this issue has not been well-recognized by the community so far. In this paper, we will show the understanding of this problem is very important -- on top of explaining the aforementioned mystery about the larger finetuning rate, it also unveils the mystery about the value of pruning [5, 30]. Besides a clearer theoretical understanding of pruning, resolving the problem can also bring us considerable performance benefits in practice.

翻译：最近的一些作品[40, 24] 观察到神经网络运行中一个有趣的现象: 更大的微调学习率可以显著改善最后的绩效。不幸的是, 它背后的原因至今仍然难以找到。本文意在用动态异构法[42] 的镜头来解释它。具体地说, 我们从一个不寻常的角度来研究神经网络运行: 作为微调的初始化, 并询问所继承的重量是否是微调的良好初始化? 动态异构法的洞察显示一个否定的答案。尽管它具有关键的作用, 这个问题至今还没有被社区充分认识到。在本文中, 我们将展示对这一问题的理解非常重要 — — 除了解释上述关于较大微调率的神秘性之外, 它还揭示了对精调价值的神秘性[5, 30] 。除了更清楚的理论理解对细调, 解决问题还能给我们在实践中带来相当大的业绩效益。

0

相关内容

【CIKM2020】多模态知识图谱推荐系统，Multi-modal KG for RS

【CIKM2020】多模态知识图谱推荐系统，Multi-modal KG for RS

专知会员服务

98+阅读 · 2020年8月24日

【SIGIR2020-NUS】解缠图协同过滤，Disentangled Graph Collaborative Filtering

【SIGIR2020-NUS】解缠图协同过滤，Disentangled Graph Collaborative Filtering

专知会员服务

60+阅读 · 2020年7月6日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT-MLSys2020】神经网络剪枝的研究进展状态，Neural Network Pruning

【MIT-MLSys2020】神经网络剪枝的研究进展状态，Neural Network Pruning

专知会员服务

29+阅读 · 2020年3月10日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

已删除

将门创投

7+阅读 · 2020年3月13日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Blockchain-enabled Network Sharing for O-RAN

Blockchain-enabled Network Sharing for O-RAN

Arxiv

0+阅读 · 2021年7月5日

Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting

Arxiv

2+阅读 · 2021年7月5日

ARM-Net: Adaptive Relation Modeling Network for Structured Data

Arxiv

0+阅读 · 2021年7月5日

Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning

Arxiv

0+阅读 · 2021年7月2日

Online Multi-Agent Forecasting with Interpretable Collaborative Graph Neural Network

Arxiv

0+阅读 · 2021年7月2日

Optimal use of auxiliary information : information geometry and empirical process

Optimal use of auxiliary information : information geometry and empirical process

Arxiv

0+阅读 · 2021年7月1日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Deep Universal Graph Embedding Neural Network

Arxiv

6+阅读 · 2019年9月25日

Incremental Reading for Question Answering

Incremental Reading for Question Answering

Arxiv

5+阅读 · 2019年1月15日

Guide Me: Interacting with Deep Networks

Arxiv

4+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【CIKM2020】多模态知识图谱推荐系统，Multi-modal KG for RS

【CIKM2020】多模态知识图谱推荐系统，Multi-modal KG for RS

专知会员服务

98+阅读 · 2020年8月24日

【SIGIR2020-NUS】解缠图协同过滤，Disentangled Graph Collaborative Filtering

【SIGIR2020-NUS】解缠图协同过滤，Disentangled Graph Collaborative Filtering

专知会员服务

60+阅读 · 2020年7月6日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT-MLSys2020】神经网络剪枝的研究进展状态，Neural Network Pruning

【MIT-MLSys2020】神经网络剪枝的研究进展状态，Neural Network Pruning

专知会员服务

29+阅读 · 2020年3月10日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

已删除

将门创投

7+阅读 · 2020年3月13日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Blockchain-enabled Network Sharing for O-RAN

Blockchain-enabled Network Sharing for O-RAN

Arxiv

0+阅读 · 2021年7月5日

Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting

Arxiv

2+阅读 · 2021年7月5日

ARM-Net: Adaptive Relation Modeling Network for Structured Data

Arxiv

0+阅读 · 2021年7月5日

Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning

Arxiv

0+阅读 · 2021年7月2日

Online Multi-Agent Forecasting with Interpretable Collaborative Graph Neural Network

Arxiv

0+阅读 · 2021年7月2日

Optimal use of auxiliary information : information geometry and empirical process

Optimal use of auxiliary information : information geometry and empirical process

Arxiv

0+阅读 · 2021年7月1日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Deep Universal Graph Embedding Neural Network

Arxiv

6+阅读 · 2019年9月25日

Incremental Reading for Question Answering

Incremental Reading for Question Answering

Arxiv

5+阅读 · 2019年1月15日

Guide Me: Interacting with Deep Networks

Arxiv

4+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员