训练一个含两层ReLU网络的解析方法 (Training a Two Layer ReLU Network Analytically) - 专知论文

会员服务 ·

0

ReLU · 梯度 · 平方损失 · 损失 · 解析方法 ·

2023 年 4 月 6 日

Training a Two Layer ReLU Network Analytically

翻译：训练一个含两层ReLU网络的解析方法

from arxiv, 17 pages, 11 figures

Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the gradient of the loss is zero) of two-layer ReLU networks with the square loss are not all local minima. However, in this work we will explore an algorithm for training two-layer neural networks with ReLU-like activation and the square loss that alternatively finds the critical points of the loss function analytically for one layer while keeping the other layer and the neuron activation pattern fixed. Experiments indicate that this simple algorithm can find deeper optima than Stochastic Gradient Descent or the Adam optimizer, obtaining significantly smaller training loss values on four out of the five real datasets evaluated. Moreover, the method is faster than the gradient descent methods and has virtually no tuning parameters.

翻译：神经网络通常使用不同变体的梯度下降优化算法进行训练，例如随机梯度下降或Adam优化器。最近的理论工作指出，具有平方损失的两层ReLU网络的临界点（损失梯度为零的点）不都是局部最小值。然而，在这项工作中，我们将探讨一种算法，用于使用ReLU-like激活和平方损失训练两层神经网络，该算法通过在一个层中解析地找到损失函数的临界点，同时保持另一个层和神经元激活模式不变。实验表明，这种简单的算法可以比随机梯度下降或Adam优化器更深地找到优化，对五个真实数据集中的四个数据集评估，在训练损失方面获得了显著较小的值。此外，该方法比梯度下降方法更快，几乎没有调节参数。

0

相关内容

ReLU

【干货书】深度学习数学：理解神经网络，347页pdf

【干货书】深度学习数学：理解神经网络，347页pdf

专知会员服务

267+阅读 · 2022年7月3日

【多伦多大学博士论文】深度学习中的训练效率和鲁棒性

【多伦多大学博士论文】深度学习中的训练效率和鲁棒性

专知会员服务

58+阅读 · 2022年6月27日

【NeurIPS 2021-康奈尔大学Guandao Yang】基于神经场的几何处理，Geometry Processing with Neural Fields

【NeurIPS 2021-康奈尔大学Guandao Yang】基于神经场的几何处理，Geometry Processing with Neural Fields

专知会员服务

25+阅读 · 2022年3月27日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于支持向量机原理的无线传感器网络定位方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-328/SMO/GLI1解析脑胶质瘤中Hedgehog信号通路异常激活的新机制

国家自然科学基金

0+阅读 · 2012年12月31日

社交-推荐网络中的隐式朋友挖掘

国家自然科学基金

2+阅读 · 2012年12月31日

随机延时神经网络的吸引子和分岔

国家自然科学基金

1+阅读 · 2012年12月31日

复杂网络中基于模体的社团结构分析及检测算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

乙酰胆碱受体及亚型在肺腺癌干细胞中的信号网络调控和代谢组学研究

国家自然科学基金

0+阅读 · 2011年12月31日

非自治无穷维动力系统指数吸引子的研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

The Power of Linear Recurrent Neural Networks

Arxiv

0+阅读 · 2023年5月25日

Neural Characteristic Activation Value Analysis for Improved ReLU Network Feature Learning

Arxiv

0+阅读 · 2023年5月25日

Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey

Arxiv

0+阅读 · 2023年5月25日

Linear Neural Network Layers Promote Learning Single- and Multiple-Index Models

Arxiv

0+阅读 · 2023年5月24日

From Tempered to Benign Overfitting in ReLU Neural Networks

Arxiv

0+阅读 · 2023年5月24日

On the eigenvalues of Toeplitz matrices with two off-diagonals

Arxiv

0+阅读 · 2023年5月24日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】深度学习数学：理解神经网络，347页pdf

【干货书】深度学习数学：理解神经网络，347页pdf

专知会员服务

267+阅读 · 2022年7月3日

【多伦多大学博士论文】深度学习中的训练效率和鲁棒性

【多伦多大学博士论文】深度学习中的训练效率和鲁棒性

专知会员服务

58+阅读 · 2022年6月27日

【NeurIPS 2021-康奈尔大学Guandao Yang】基于神经场的几何处理，Geometry Processing with Neural Fields

【NeurIPS 2021-康奈尔大学Guandao Yang】基于神经场的几何处理，Geometry Processing with Neural Fields

专知会员服务

25+阅读 · 2022年3月27日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

The Power of Linear Recurrent Neural Networks

Arxiv

0+阅读 · 2023年5月25日

Neural Characteristic Activation Value Analysis for Improved ReLU Network Feature Learning

Arxiv

0+阅读 · 2023年5月25日

Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey

Arxiv

0+阅读 · 2023年5月25日

Linear Neural Network Layers Promote Learning Single- and Multiple-Index Models

Arxiv

0+阅读 · 2023年5月24日

From Tempered to Benign Overfitting in ReLU Neural Networks

Arxiv

0+阅读 · 2023年5月24日

On the eigenvalues of Toeplitz matrices with two off-diagonals

Arxiv

0+阅读 · 2023年5月24日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于支持向量机原理的无线传感器网络定位方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-328/SMO/GLI1解析脑胶质瘤中Hedgehog信号通路异常激活的新机制

国家自然科学基金

0+阅读 · 2012年12月31日

社交-推荐网络中的隐式朋友挖掘

国家自然科学基金

2+阅读 · 2012年12月31日

随机延时神经网络的吸引子和分岔

国家自然科学基金

1+阅读 · 2012年12月31日

复杂网络中基于模体的社团结构分析及检测算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

乙酰胆碱受体及亚型在肺腺癌干细胞中的信号网络调控和代谢组学研究

国家自然科学基金

0+阅读 · 2011年12月31日

非自治无穷维动力系统指数吸引子的研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员