神经网络的重力与固定点不相连接:变量测量视角 (Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective) - 专知论文

会员服务 ·

0

Weight · 平稳的 · Neural Networks · Networking · 损失 ·

2022 年 6 月 17 日

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

翻译：神经网络的重力与固定点不相连接:变量测量视角

Jingzhao Zhang,Haochuan Li,Suvrit Sra,Ali Jadbabaie

This work examines the deep disconnect between existing theoretical analyses of gradient-based algorithms and the practice of training deep neural networks. Specifically, we provide numerical evidence that in large-scale neural network training (e.g., ImageNet + ResNet101, and WT103 + TransformerXL models), the neural network's weights do not converge to stationary points where the gradient of the loss is zero. Remarkably, however, we observe that even though the weights do not converge to stationary points, the progress in minimizing the loss function halts and training loss stabilizes. Inspired by this observation, we propose a new perspective based on ergodic theory of dynamical systems to explain it. Rather than studying the evolution of weights, we study the evolution of the distribution of weights. We prove convergence of the distribution of weights to an approximate invariant measure, thereby explaining how the training loss can stabilize without weights necessarily converging to stationary points. We further discuss how this perspective can better align optimization theory with empirical observations in machine learning practice.

翻译：这项工作考察了基于梯度的算法的现有理论分析与深神经网络培训实践之间的深度脱节。具体地说,我们提供了数字证据,证明在大型神经网络培训(例如,图像Net+ResNet101和WT103+变压器XL模型)中,神经网络的权重没有与损失梯度为零的固定点趋同。然而,值得注意的是,我们注意到,即使重量不与固定点趋同,在尽量减少损失功能方面的进展也停止了,培训损失稳定了。在这项观察的启发下,我们提出了一个基于动态系统理论的新观点来解释这一点。我们不是研究重量的演变,而是研究重量分布的演变。我们证明,重量的分配与大约的变异度测量值是趋同的,从而解释了培训损失如何稳定,而不必然使重量与固定点相趋同。我们进一步讨论了这种观点如何能够更好地将优化理论与机器学习实践中的经验观测结果相协调。

0

相关内容

Weight

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

组蛋白修饰H3K36me3调控肿瘤细胞DNA损伤信号通路机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cullin1通过调控TIMP-2泛素化影响乳腺癌转移的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

北斗卫星天线相位中心改正建模

国家自然科学基金

1+阅读 · 2013年12月31日

大黄鱼抗氧化酶Peroxiredoxin IV调控炎症反应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

RNF4的磷酸化修饰在不同细胞周期中对DNA损伤应答的影响与机制

国家自然科学基金

0+阅读 · 2012年12月31日

活性氧稳态调节在ABA受体ABAR介导的信号通路中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

How and what to learn:The modes of machine learning

Arxiv

0+阅读 · 2022年8月8日

Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks

Arxiv

0+阅读 · 2022年8月5日

Revisiting the role of heterophily in graph representation learning: An edge classification perspective

Arxiv

0+阅读 · 2022年8月5日

A Lightweight, Efficient and Explainable-by-Design Convolutional Neural Network for Internet Traffic Classification

A Lightweight, Efficient and Explainable-by-Design Convolutional Neural Network for Internet Traffic Classification

Arxiv

0+阅读 · 2022年8月4日

Selective Machine Learning of the Average Treatment Effect with an Invalid Instrumental Variable

Selective Machine Learning of the Average Treatment Effect with an Invalid Instrumental Variable

Arxiv

0+阅读 · 2022年8月4日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

How and what to learn:The modes of machine learning

Arxiv

0+阅读 · 2022年8月8日

Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks

Arxiv

0+阅读 · 2022年8月5日

Revisiting the role of heterophily in graph representation learning: An edge classification perspective

Arxiv

0+阅读 · 2022年8月5日

A Lightweight, Efficient and Explainable-by-Design Convolutional Neural Network for Internet Traffic Classification

A Lightweight, Efficient and Explainable-by-Design Convolutional Neural Network for Internet Traffic Classification

Arxiv

0+阅读 · 2022年8月4日

Selective Machine Learning of the Average Treatment Effect with an Invalid Instrumental Variable

Selective Machine Learning of the Average Treatment Effect with an Invalid Instrumental Variable

Arxiv

0+阅读 · 2022年8月4日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

相关基金

组蛋白修饰H3K36me3调控肿瘤细胞DNA损伤信号通路机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cullin1通过调控TIMP-2泛素化影响乳腺癌转移的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

北斗卫星天线相位中心改正建模

国家自然科学基金

1+阅读 · 2013年12月31日

大黄鱼抗氧化酶Peroxiredoxin IV调控炎症反应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

RNF4的磷酸化修饰在不同细胞周期中对DNA损伤应答的影响与机制

国家自然科学基金

0+阅读 · 2012年12月31日

活性氧稳态调节在ABA受体ABAR介导的信号通路中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员