超参数深度均衡模型的全局收敛性 (Global Convergence of Over-parameterized Deep Equilibrium Models) - 专知论文

会员服务 ·

0

平衡点 · 超参数 · 全局收敛 · 均衡 · 收敛性 ·

2023 年 3 月 29 日

Global Convergence of Over-parameterized Deep Equilibrium Models

翻译：超参数深度均衡模型的全局收敛性

Zenan Ling,Xingyu Xie,Qiuhao Wang,Zongpeng Zhang,Zhouchen Lin

from arxiv, Accepted by AISTATS 2023

A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.

翻译：深度均衡模型（DEQ）通过无穷深度的权重绑定模型和输入注入的平衡点来隐式地定义。 DEQ不进行无穷计算，而是直接通过根找到平衡点，并通过隐式微分计算梯度。本研究调查了超参数DEQ的训练动态。通过对初始平衡点的假设条件，我们表明在训练过程中始终存在唯一的平衡点，并且对于二次损失函数，梯度下降可被证明以线性收敛速度收敛到全局最优解。为了展示通过轻微超参数确定所需的初始条件是满足的，我们对随机DEQ进行了细致的分析。我们提出了一个新的概率框架来克服在无穷深度的权重绑定模型的非渐近分析中的技术困难。

0

相关内容

平衡点

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

牛津大学《多智能体影响图的均衡优化: 理论和实践》，Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

牛津大学《多智能体影响图的均衡优化: 理论和实践》，Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

专知会员服务

26+阅读 · 2022年4月10日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【ICML2020】序数非负矩阵分解推荐，On the Number of Linear Regions of Convolutional Neural Networks

【ICML2020】序数非负矩阵分解推荐，On the Number of Linear Regions of Convolutional Neural Networks

专知会员服务

17+阅读 · 2020年6月4日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

使用 Keras Tuner 调节超参数

使用 Keras Tuner 调节超参数

TensorFlow

15+阅读 · 2020年2月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

资源｜斯坦福课程：深度学习理论！

资源｜斯坦福课程：深度学习理论！

全球人工智能

17+阅读 · 2017年11月9日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

基于散射SNOM的超分辨成像光场的检测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

弹性力学中鞍点问题的高效预处理方法与理论

国家自然科学基金

0+阅读 · 2013年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

受时变对流扩散方程约束的最优控制问题的SUPG方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

博弈精炼的一般框架和Nash平衡实现的理性路径

国家自然科学基金

3+阅读 · 2011年12月31日

HGF诱导NSCLC细胞对EGFR-TKIs耐药机制的研究。

国家自然科学基金

0+阅读 · 2011年12月31日

水溶液对蛋白质电子结构的等效势

国家自然科学基金

0+阅读 · 2009年12月31日

EPO抑制创伤性脑水肿的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

Consistent Conjectural Variations Equilibrium: Characterization & Stability for a Class of Continuous Games

Arxiv

0+阅读 · 2023年5月19日

Monte-Carlo Search for an Equilibrium in Dec-POMDPs

Arxiv

0+阅读 · 2023年5月19日

Accelerating Convergence in Global Non-Convex Optimization with Reversible Diffusion

Arxiv

0+阅读 · 2023年5月19日

Structural Pruning for Diffusion Models

Arxiv

0+阅读 · 2023年5月18日

Accelerated gradient descent method for functionals of probability measures by new convexity and smoothness based on transport maps

Arxiv

0+阅读 · 2023年5月18日

Geometric Constellation Shaping for Fiber-Optic Channels via End-to-End Learning

Arxiv

0+阅读 · 2023年5月17日

Utility Theory of Synthetic Data Generation

Arxiv

0+阅读 · 2023年5月17日

CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models

Arxiv

17+阅读 · 2021年3月23日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

How to train your MAML

Arxiv

26+阅读 · 2019年3月5日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

牛津大学《多智能体影响图的均衡优化: 理论和实践》，Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

牛津大学《多智能体影响图的均衡优化: 理论和实践》，Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

专知会员服务

26+阅读 · 2022年4月10日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【ICML2020】序数非负矩阵分解推荐，On the Number of Linear Regions of Convolutional Neural Networks

【ICML2020】序数非负矩阵分解推荐，On the Number of Linear Regions of Convolutional Neural Networks

专知会员服务

17+阅读 · 2020年6月4日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

【NeurlPS2019论文总结】一致收敛可能无法解释深度学习中的泛化现象，Uniform convergence may be unable to explain generalization in deep learning

专知会员服务

15+阅读 · 2019年12月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

《认知战的历史视角：从冷战心理战行动到AI驱动的信息战》最新报告

大语言模型智能体强化学习：全景综述

相关资讯

使用 Keras Tuner 调节超参数

使用 Keras Tuner 调节超参数

TensorFlow

15+阅读 · 2020年2月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

资源｜斯坦福课程：深度学习理论！

资源｜斯坦福课程：深度学习理论！

全球人工智能

17+阅读 · 2017年11月9日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Consistent Conjectural Variations Equilibrium: Characterization & Stability for a Class of Continuous Games

Arxiv

0+阅读 · 2023年5月19日

Monte-Carlo Search for an Equilibrium in Dec-POMDPs

Arxiv

0+阅读 · 2023年5月19日

Accelerating Convergence in Global Non-Convex Optimization with Reversible Diffusion

Arxiv

0+阅读 · 2023年5月19日

Structural Pruning for Diffusion Models

Arxiv

0+阅读 · 2023年5月18日

Accelerated gradient descent method for functionals of probability measures by new convexity and smoothness based on transport maps

Arxiv

0+阅读 · 2023年5月18日

Geometric Constellation Shaping for Fiber-Optic Channels via End-to-End Learning

Arxiv

0+阅读 · 2023年5月17日

Utility Theory of Synthetic Data Generation

Arxiv

0+阅读 · 2023年5月17日

CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models

Arxiv

17+阅读 · 2021年3月23日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

How to train your MAML

Arxiv

26+阅读 · 2019年3月5日

相关基金

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

基于散射SNOM的超分辨成像光场的检测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

弹性力学中鞍点问题的高效预处理方法与理论

国家自然科学基金

0+阅读 · 2013年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

受时变对流扩散方程约束的最优控制问题的SUPG方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

博弈精炼的一般框架和Nash平衡实现的理性路径

国家自然科学基金

3+阅读 · 2011年12月31日

HGF诱导NSCLC细胞对EGFR-TKIs耐药机制的研究。

国家自然科学基金

0+阅读 · 2011年12月31日

水溶液对蛋白质电子结构的等效势

国家自然科学基金

0+阅读 · 2009年12月31日

EPO抑制创伤性脑水肿的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员