三个体制中 " 环球可以发生 " 培训规模 - 动态神经网络 (Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes) - 专知论文

会员服务 ·

0

Sphering · Networking · Neural Networks · Learning · 规范化的 ·

2022 年 10 月 17 日

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

翻译：三个体制中 " 环球可以发生 " 培训规模 - 动态神经网络

Maxim Kodryan,Ekaterina Lobacheva,Maksim Nakhodnov,Dmitry Vetrov

from arxiv, Published in NeurIPS 2022. First three authors contributed equally

A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.

翻译：诸如批量正常化等深层次学习正常化技术的基本特性,正在使这些参数的内在领域成为单元领域,因此,它们的梯度优化动态可以通过球形优化和不同有效学习率(ELR)来体现,而以前曾研究过这一点。然而,不同的ELR可能会模糊内在损失地貌结构的某些特征。在这项工作中,我们利用固定的ELR直接调查实地培训规模变化性神经网络的特性。我们发现,这种培训的三个制度取决于ELR的价值:趋同、混乱平衡和差异。我们详细研究这些制度,既从理论上研究一个微小的例子,又从经验上透彻分析实际规模变化性深层次学习模式。每个制度都有独特的特点,并反映内在损失地貌的具体特性,其中一些特点与以前对常规和规模变化性神经网络培训的研究有很强的相似之处。最后,我们证明,所发现的制度如何反映在常规的标准化网络培训中,如何利用这些制度来实现更好的选取。

0

相关内容

Sphering

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

DNA羟化酶TET2失活在结直肠癌发病中的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

小G蛋白泛素化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

RNA解旋酶ddx39a在斑马鱼胚胎发育中的功能及调控细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

EZH2和MLL2基因突变在滤泡性淋巴瘤发生中的作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

MHC II类分子在NK细胞功能发育中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

HER2/uPAR通路调控乳腺肿瘤休眠和细胞周期的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

调控家蚕发育非编码RNA（non-coding RNA, ncRNA）的功能解析

国家自然科学基金

0+阅读 · 2011年12月31日

原癌基因AEG-1调控胶质瘤细胞凋亡的生物学功能及其分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

DegP (HtrA)的蛋白酶与分子伴侣活性之间功能转变的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets

Arxiv

0+阅读 · 2022年11月28日

Establishment of Neural Networks Robust to Label Noise

Arxiv

0+阅读 · 2022年11月28日

Foiling Explanations in Deep Neural Networks

Arxiv

0+阅读 · 2022年11月27日

Dense Hebbian neural networks: a replica symmetric picture of unsupervised learning

Arxiv

0+阅读 · 2022年11月25日

Segmentation with mixed supervision: Confidence maximization helps knowledge distillation

Arxiv

0+阅读 · 2022年11月24日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

Arxiv

14+阅读 · 2022年3月25日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Learning to Propagate for Graph Meta-Learning

Arxiv

14+阅读 · 2019年9月11日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《2024年度美国防部作战测试与评估报告》500页

《面相未来作战空中系统中有人-无人编组的AI驱动协作模式选择》含slides

无人机编队飞行：复杂环境中作战的策略、挑战与应用

《探索军事背景下共享大语言模型：AI助手与智能体部署中可扩展性与效率的早期洞察》（含44页slides）

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets

Arxiv

0+阅读 · 2022年11月28日

Establishment of Neural Networks Robust to Label Noise

Arxiv

0+阅读 · 2022年11月28日

Foiling Explanations in Deep Neural Networks

Arxiv

0+阅读 · 2022年11月27日

Dense Hebbian neural networks: a replica symmetric picture of unsupervised learning

Arxiv

0+阅读 · 2022年11月25日

Segmentation with mixed supervision: Confidence maximization helps knowledge distillation

Arxiv

0+阅读 · 2022年11月24日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

Arxiv

14+阅读 · 2022年3月25日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Learning to Propagate for Graph Meta-Learning

Arxiv

14+阅读 · 2019年9月11日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

相关基金

DNA羟化酶TET2失活在结直肠癌发病中的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

小G蛋白泛素化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

RNA解旋酶ddx39a在斑马鱼胚胎发育中的功能及调控细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

EZH2和MLL2基因突变在滤泡性淋巴瘤发生中的作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

MHC II类分子在NK细胞功能发育中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

HER2/uPAR通路调控乳腺肿瘤休眠和细胞周期的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

调控家蚕发育非编码RNA（non-coding RNA, ncRNA）的功能解析

国家自然科学基金

0+阅读 · 2011年12月31日

原癌基因AEG-1调控胶质瘤细胞凋亡的生物学功能及其分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

DegP (HtrA)的蛋白酶与分子伴侣活性之间功能转变的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员