过参数化模型中随机高斯-牛顿法的非渐近优化与泛化界 (Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models) - 专知论文

会员服务 ·

0

泛化 · 参数化 · 参数化模型 · 牛顿法 · 分析 ·

Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

翻译：过参数化模型中随机高斯-牛顿法的非渐近优化与泛化界

An important question in deep learning is how higher-order optimization methods affect generalization. In this work, we analyze a stochastic Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling for training overparameterized deep neural networks with smooth activations in a regression setting. Our theoretical contributions are twofold. First, we establish finite-time convergence bounds via a variable-metric analysis in parameter space, with explicit dependencies on the batch size, network width and depth. Second, we derive non-asymptotic generalization bounds for SGN using uniform stability in the overparameterized regime, characterizing the impact of curvature, batch size, and overparameterization on generalization performance. Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the Gauss-Newton matrix along the optimization path yields tighter stability bounds.

翻译：深度学习中的一个重要问题是高阶优化方法如何影响泛化性能。本文在回归场景下，针对具有光滑激活函数的过参数化深度神经网络，分析了采用Levenberg-Marquardt阻尼与小批量采样的随机高斯-牛顿（SGN）训练方法。我们的理论贡献包含两个方面：首先，通过在参数空间进行变度量分析，建立了显式依赖于批量大小、网络宽度与深度的有限时间收敛界；其次，利用过参数化机制中的一致稳定性，推导了SGN的非渐近泛化界，揭示了曲率、批量大小及过参数化对泛化性能的影响机制。理论结果表明，当优化路径上高斯-牛顿矩阵的最小特征值较大时，SGN将进入更优的泛化区域，此时稳定性界更为紧致。

0

相关内容

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

46+阅读 · 11月21日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

22+阅读 · 2023年5月10日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

专知会员服务

20+阅读 · 2022年3月4日

知识图谱嵌入模型的概率标定,Probability Calibration for Knowledge Graph Embedding Models

专知会员服务

36+阅读 · 2020年5月11日

论文浅尝 | ICLR2020 - 基于组合的多关系图卷积网络

论文浅尝 | ICLR2020 - 基于组合的多关系图卷积网络

开放知识图谱

21+阅读 · 2020年4月24日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

从最大似然到EM算法：一致的理解方式

从最大似然到EM算法：一致的理解方式

PaperWeekly

19+阅读 · 2018年3月19日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

低维有限典型群与线传递2-(v,k,1)设计

国家自然科学基金

0+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

基于对合否定的SBL公理化扩张系统的程度化推理及逻辑控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

解析函数空间上的Toeplitz型奇异积分算子

国家自然科学基金

0+阅读 · 2014年12月31日

Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints

Arxiv

0+阅读 · 12月4日

Random-Key Metaheuristic and Linearization for the Quadratic Multiple Constraints Variable-Sized Bin Packing Problem

Arxiv

0+阅读 · 11月15日

LongComp: Long-Tail Compositional Zero-Shot Generalization for Robust Trajectory Prediction

Arxiv

0+阅读 · 11月13日

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Arxiv

0+阅读 · 11月10日

Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

Arxiv

0+阅读 · 11月6日

VIP会员

文章信息

相关主题

参数化模型

相关VIP内容

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

46+阅读 · 11月21日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

22+阅读 · 2023年5月10日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

专知会员服务

20+阅读 · 2022年3月4日

知识图谱嵌入模型的概率标定,Probability Calibration for Knowledge Graph Embedding Models

专知会员服务

36+阅读 · 2020年5月11日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

论文浅尝 | ICLR2020 - 基于组合的多关系图卷积网络

论文浅尝 | ICLR2020 - 基于组合的多关系图卷积网络

开放知识图谱

21+阅读 · 2020年4月24日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

从最大似然到EM算法：一致的理解方式

从最大似然到EM算法：一致的理解方式

PaperWeekly

19+阅读 · 2018年3月19日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints

Arxiv

0+阅读 · 12月4日

Random-Key Metaheuristic and Linearization for the Quadratic Multiple Constraints Variable-Sized Bin Packing Problem

Arxiv

0+阅读 · 11月15日

LongComp: Long-Tail Compositional Zero-Shot Generalization for Robust Trajectory Prediction

Arxiv

0+阅读 · 11月13日

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Arxiv

0+阅读 · 11月10日

Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

Arxiv

0+阅读 · 11月6日

相关基金

低维有限典型群与线传递2-(v,k,1)设计

国家自然科学基金

0+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

基于对合否定的SBL公理化扩张系统的程度化推理及逻辑控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

解析函数空间上的Toeplitz型奇异积分算子

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员