优化算法中的记忆如何隐式修改损失函数 (How Memory in Optimization Algorithms Implicitly Modifies the Loss) - 专知论文

会员服务 ·

0

算法 · 损失 · 损失函数 · 优化算法 · 衰减 ·

How Memory in Optimization Algorithms Implicitly Modifies the Loss

翻译：优化算法中的记忆如何隐式修改损失函数

Matias D. Cattaneo,Boris Shigida

In modern optimization methods used in deep learning, each update depends on the history of previous iterations, often referred to as memory, and this dependence decays fast as the iterates go further into the past. For example, gradient descent with momentum has exponentially decaying memory through exponentially averaged past gradients. We introduce a general technique for identifying a memoryless algorithm that approximates an optimization algorithm with memory. It is obtained by replacing all past iterates in the update by the current one, and then adding a correction term arising from memory (also a function of the current iterate). This correction term can be interpreted as a perturbation of the loss, and the nature of this perturbation can inform how memory implicitly (anti-)regularizes the optimization dynamics. As an application of our theory, we find that Lion does not have the kind of implicit anti-regularization induced by memory that AdamW does, providing a theory-based explanation for Lion's better generalization performance recently documented.

翻译：在深度学习使用的现代优化方法中，每次更新依赖于先前迭代的历史（通常称为记忆），且这种依赖性随着迭代步数向过去追溯而快速衰减。例如，带动量的梯度下降通过指数平均的过去梯度实现指数衰减的记忆。我们提出一种通用技术，用于识别能够近似具有记忆的优化算法的无记忆算法。该方法通过将更新中所有过去迭代点替换为当前迭代点，并添加由记忆产生的修正项（也是当前迭代点的函数）来实现。该修正项可解释为损失函数的扰动，其性质能够揭示记忆如何隐式地（反）正则化优化动态。作为我们理论的应用，我们发现Lion算法不具备AdamW算法中由记忆诱导的隐式反正则化特性，这为近期文献中记载的Lion算法更优的泛化性能提供了理论解释。

0

相关内容

在数学和计算机科学之中，算法（Algorithm）为一个计算的具体步骤，常用于计算、数据处理和自动推理。精确而言，算法是一个表示为有限长列表的有效方法。算法应包含清晰定义的指令用于计算函数。来自维基百科：算法

【ICCV2023】保留模态结构改进多模态学习

【ICCV2023】保留模态结构改进多模态学习

专知会员服务

31+阅读 · 2023年8月28日

【ICML2023】MetaModulation: 用更少任务进行小样本学习的变分特征层次结构学习

【ICML2023】MetaModulation: 用更少任务进行小样本学习的变分特征层次结构学习

专知会员服务

35+阅读 · 2023年5月22日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知会员服务

87+阅读 · 2020年8月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

视觉识别中的实用鲁棒回归技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

模糊认知集群优化的聚类算法

国家自然科学基金

8+阅读 · 2015年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

高维复杂结构数据降维

国家自然科学基金

10+阅读 · 2014年12月31日

Convergence for Discrete Parameter Update Schemes

Arxiv

0+阅读 · 12月5日

Convergence for Discrete Parameter Updates

Arxiv

0+阅读 · 12月3日

Limitations of Membership Queries in Testable Learning

Arxiv

0+阅读 · 12月1日

Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning

Arxiv

0+阅读 · 11月17日

Parameterized Prompt for Incremental Object Detection

Arxiv

0+阅读 · 11月4日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV2023】保留模态结构改进多模态学习

【ICCV2023】保留模态结构改进多模态学习

专知会员服务

31+阅读 · 2023年8月28日

【ICML2023】MetaModulation: 用更少任务进行小样本学习的变分特征层次结构学习

【ICML2023】MetaModulation: 用更少任务进行小样本学习的变分特征层次结构学习

专知会员服务

35+阅读 · 2023年5月22日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知会员服务

87+阅读 · 2020年8月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

论文浅尝 | 当知识图谱遇上零样本学习——零样本学习综述

开放知识图谱

22+阅读 · 2018年9月26日

相关论文

Convergence for Discrete Parameter Update Schemes

Arxiv

0+阅读 · 12月5日

Convergence for Discrete Parameter Updates

Arxiv

0+阅读 · 12月3日

Limitations of Membership Queries in Testable Learning

Arxiv

0+阅读 · 12月1日

Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning

Arxiv

0+阅读 · 11月17日

Parameterized Prompt for Incremental Object Detection

Arxiv

0+阅读 · 11月4日

相关基金

视觉识别中的实用鲁棒回归技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

模糊认知集群优化的聚类算法

国家自然科学基金

8+阅读 · 2015年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

高维复杂结构数据降维

国家自然科学基金

10+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员