深 ReLU 网络神经相近内核内核最小能量值上的紧闭环环环 (Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks) - 专知论文

会员服务 ·

0

ReLU · Networking · 核化 · Lipschitz常数 · 层 ·

2021 年 6 月 11 日

Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks

翻译：深 ReLU 网络神经相近内核内核最小能量值上的紧闭环环环

Quynh Nguyen,Marco Mondelli,Guido Montufar

from arxiv, ICML 2021

A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been related to the memorization capacity, the global convergence of gradient descent algorithms and the generalization of deep nets. However, existing results either provide bounds in the two-layer setting or assume that the spectrum of the NTK matrices is bounded away from 0 for multi-layer networks. In this paper, we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU nets, both in the limiting case of infinite widths and for finite widths. In the finite-width setting, the network architectures we consider are fairly general: we require the existence of a wide layer with roughly order of $N$ neurons, $N$ being the number of data samples; and the scaling of the remaining layer widths is arbitrary (up to logarithmic factors). To obtain our results, we analyze various quantities of independent interest: we give lower bounds on the smallest singular value of hidden feature matrices, and upper bounds on the Lipschitz constant of input-output feature maps.

翻译：最近的一项工作通过Neural Tangent Kernel(NTK)分析了深神经网络的理论特性。特别是,NTK最小的精度值与记忆能力、梯度下位算法的全球趋同和深网的普遍化有关,然而,现有的结果要么提供了两层设置的界限,要么假定NTK矩阵的频谱与多层网络的 0 相隔。在本文中,我们为深RELU网提供了NTK矩阵最小的精度值的严格界限,这在无限宽度和有限宽度的限制方面都是如此。在有限的宽度设置中,我们认为网络结构相当笼统:我们需要一个大致为N$的宽层,$N$是数据样品的数量;以及剩余层宽度的扩大是任意的(加上对数因素)。为了获得我们的结果,我们分析了各种独立兴趣的数量:我们对隐藏的地平面图最小的奇数值,我们给隐藏的地平面图的最小的奇数值,以及Lip的上框。

1

相关内容

ReLU

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

revelation of MONet

revelation of MONet

CreateAMind

5+阅读 · 2019年6月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

3+阅读 · 2018年4月10日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Convergence bounds for nonlinear least squares and applications to tensor recovery

Arxiv

0+阅读 · 2021年8月11日

Optimal Binary LCD Codes

Arxiv

0+阅读 · 2021年8月10日

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

Arxiv

0+阅读 · 2021年8月10日

Singularly Near Optimal Leader Election in Asynchronous Networks

Arxiv

0+阅读 · 2021年8月9日

The mysteries of the best approximation and Chebyshev expansion for the function with logarithmic regularities

Arxiv

0+阅读 · 2021年8月9日

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Arxiv

0+阅读 · 2021年8月8日

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

Arxiv

0+阅读 · 2021年8月6日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

Arxiv

8+阅读 · 2019年11月4日

VIP会员

文章信息

相关主题

Lipschitz常数

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

revelation of MONet

revelation of MONet

CreateAMind

5+阅读 · 2019年6月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

3+阅读 · 2018年4月10日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Convergence bounds for nonlinear least squares and applications to tensor recovery

Arxiv

0+阅读 · 2021年8月11日

Optimal Binary LCD Codes

Arxiv

0+阅读 · 2021年8月10日

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

Arxiv

0+阅读 · 2021年8月10日

Singularly Near Optimal Leader Election in Asynchronous Networks

Arxiv

0+阅读 · 2021年8月9日

The mysteries of the best approximation and Chebyshev expansion for the function with logarithmic regularities

Arxiv

0+阅读 · 2021年8月9日

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Arxiv

0+阅读 · 2021年8月8日

Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

Arxiv

0+阅读 · 2021年8月6日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

Arxiv

8+阅读 · 2019年11月4日

微信扫码咨询专知VIP会员