Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks - 专知论文

会员服务 ·

0

泛函 · Networking · 有偏 · 曲率 · 均方误差 ·

2023 年 4 月 22 日

Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks

翻译：暂无翻译

Hui Jin,Guido Montúfar

from arxiv, 97 pages, 14 figures. Added the discussion of SGD and implications to generalization

We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. \hj{For stochastic gradient descent we obtain the same implicit bias result.} We obtain a similar result for different activation functions. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength.

翻译：暂无翻译

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Sestrin2/AMPK信号通路调控新生鼠缺氧缺血脑损伤细胞自噬的新机制

国家自然科学基金

0+阅读 · 2015年12月31日

各向同性和TI弹性波方程高精度有限差分数值解法新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

高速翼滑艇瞬态流体动力特性及运动机理研究

国家自然科学基金

1+阅读 · 2013年12月31日

糖化终末产物诱导胰岛β细胞炎性损伤的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的不规则波浪建模方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances

Arxiv

0+阅读 · 2023年6月8日

Numerical computation of the half Laplacian by means of a fast convolution algorithm

Arxiv

0+阅读 · 2023年6月8日

Entropy-based Training Methods for Scalable Neural Implicit Sampler

Arxiv

0+阅读 · 2023年6月8日

From dense to sparse design: Optimal rates under the supremum norm for estimating the mean function in functional data analysis

Arxiv

0+阅读 · 2023年6月7日

Stochastic Gradient Descent-Induced Drift of Representation in a Two-Layer Neural Network

Arxiv

0+阅读 · 2023年6月6日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同时代的军事指挥控制演进

《英国智库：瓦解俄罗斯防空系统生产，夺回制空权》最新报告

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

《战术突击工具包：军队的“边缘”操作系统》报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances

Arxiv

0+阅读 · 2023年6月8日

Numerical computation of the half Laplacian by means of a fast convolution algorithm

Arxiv

0+阅读 · 2023年6月8日

Entropy-based Training Methods for Scalable Neural Implicit Sampler

Arxiv

0+阅读 · 2023年6月8日

From dense to sparse design: Optimal rates under the supremum norm for estimating the mean function in functional data analysis

Arxiv

0+阅读 · 2023年6月7日

Stochastic Gradient Descent-Induced Drift of Representation in a Two-Layer Neural Network

Arxiv

0+阅读 · 2023年6月6日

相关基金

Sestrin2/AMPK信号通路调控新生鼠缺氧缺血脑损伤细胞自噬的新机制

国家自然科学基金

0+阅读 · 2015年12月31日

各向同性和TI弹性波方程高精度有限差分数值解法新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

高速翼滑艇瞬态流体动力特性及运动机理研究

国家自然科学基金

1+阅读 · 2013年12月31日

糖化终末产物诱导胰岛β细胞炎性损伤的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的不规则波浪建模方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员