双轨差动: 任何时间的适应性一次性一次,同时, 用于斯托卡优化的优化自动优化 (Two-Tailed Averaging: Anytime Adaptive Once-in-a-while Optimal Iterate Averaging for Stochastic Optimization) - 专知论文

会员服务 ·

0

优化器 · 超参数 · 泛化理论 · 损失 · 近似 ·

2022 年 12 月 6 日

Two-Tailed Averaging: Anytime Adaptive Once-in-a-while Optimal Iterate Averaging for Stochastic Optimization

翻译：双轨差动: 任何时间的适应性一次性一次,同时, 用于斯托卡优化的优化自动优化

Tail averaging improves on Polyak averaging's non-asymptotic behaviour by excluding a number of leading iterates of stochastic optimization from its calculations. In practice, with a finite number of optimization steps and a learning rate that cannot be annealed to zero, tail averaging can get much closer to a local minimum point of the training loss than either the individual iterates or the Polyak average. However, the number of leading iterates to ignore is an important hyperparameter, and starting averaging too early or too late leads to inefficient use of resources or suboptimal solutions. Our work focusses on improving generalization, which makes setting this hyperparameter even more difficult, especially in the presence of other hyperparameters and overfitting. Furthermore, before averaging starts, the loss is only weakly informative of the final performance, which makes early stopping unreliable. To alleviate these problems, we propose an anytime variant of tail averaging intended for improving generalization not pure optimization, that has no hyperparameters and approximates the optimal tail at all optimization steps. Our algorithm is based on two running averages with adaptive lengths bounded in terms of the optimal tail length, one of which achieves approximate optimality with some regularity. Requiring only the additional storage for two sets of weights and periodic evaluation of the loss, the proposed two-tailed averaging algorithm is a practical and widely applicable method for improving generalization.

翻译：Polyak 平均的不禁止性行为的平均改善程度Polyak 平均尾部在Polyak 平均非禁忌性行为上的平均改善程度,方法是将一些主要的重复式资源优化方法从计算中排除出来。在实践中,如果有数量有限的优化步骤和学习率不能被折射为零,那么,平均尾部可以大大接近培训损失的当地最低点,这比单个重复或Polyak 平均差差的地方差得多。然而,要忽略的领先迭代数是一个重要的超光度计,开始的过早或过晚,导致资源或次最佳解决办法的使用效率低下。我们的工作重点是改进一般化,这会使超常准度的设定更加困难,特别是在存在其他超常参数和超适量的学习率的情况下。此外,在平均开始之前,最后性能损失的信息度远比单个或Polyak平均值低得多,为了缓解这些问题,我们建议一个随时可以变式的尾部平均差,这不会导致所有优化步骤的最佳尾部的利用效率。我们的算法基于两个连续平均平均平均平均平均数,只有两个调整的长度的长度,而最接近于两种最精确的存储法。

0

相关内容

优化器

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

自适应移动Kriging插值响应面可靠性分析方法及其应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

稳健且有效的回归和变量选择方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

表面等离激元手性金属纳米结构制备与表征

国家自然科学基金

0+阅读 · 2012年12月31日

HER2/uPAR通路调控乳腺肿瘤休眠和细胞周期的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

PCOS患者卵巢颗粒细胞对卵子及早期胚胎发育潜能的基因调控

国家自然科学基金

0+阅读 · 2011年12月31日

Anginex重组腺相关病毒抗血管生成信号转导通路的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于梯度Kriging方法的轮胎花纹形状优化

国家自然科学基金

0+阅读 · 2011年12月31日

一种适用于高维问题的Co-kriging代理模型新方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Federated Minimax Optimization with Client Heterogeneity

Arxiv

0+阅读 · 2023年2月8日

Leveraging Demonstrations to Improve Online Learning: Quality Matters

Arxiv

0+阅读 · 2023年2月7日

Adaptive Parameterization of Deep Learning Models for Federated Learning

Adaptive Parameterization of Deep Learning Models for Federated Learning

Arxiv

0+阅读 · 2023年2月6日

$z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning

Arxiv

0+阅读 · 2023年2月6日

Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

Arxiv

0+阅读 · 2023年2月6日

Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models

Arxiv

0+阅读 · 2023年2月5日

Modeling Adaptive Fine-grained Task Relatedness for Joint CTR-CVR Estimation

Arxiv

0+阅读 · 2023年2月5日

Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

Arxiv

0+阅读 · 2023年2月4日

A Simple Approach for Local and Global Variable Importance in Nonlinear Regression Models

Arxiv

0+阅读 · 2023年2月3日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Arxiv

11+阅读 · 2020年2月18日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Federated Minimax Optimization with Client Heterogeneity

Arxiv

0+阅读 · 2023年2月8日

Leveraging Demonstrations to Improve Online Learning: Quality Matters

Arxiv

0+阅读 · 2023年2月7日

Adaptive Parameterization of Deep Learning Models for Federated Learning

Adaptive Parameterization of Deep Learning Models for Federated Learning

Arxiv

0+阅读 · 2023年2月6日

$z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning

Arxiv

0+阅读 · 2023年2月6日

Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

Arxiv

0+阅读 · 2023年2月6日

Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models

Arxiv

0+阅读 · 2023年2月5日

Modeling Adaptive Fine-grained Task Relatedness for Joint CTR-CVR Estimation

Arxiv

0+阅读 · 2023年2月5日

Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

Arxiv

0+阅读 · 2023年2月4日

A Simple Approach for Local and Global Variable Importance in Nonlinear Regression Models

Arxiv

0+阅读 · 2023年2月3日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Arxiv

11+阅读 · 2020年2月18日

相关基金

自适应移动Kriging插值响应面可靠性分析方法及其应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

稳健且有效的回归和变量选择方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

表面等离激元手性金属纳米结构制备与表征

国家自然科学基金

0+阅读 · 2012年12月31日

HER2/uPAR通路调控乳腺肿瘤休眠和细胞周期的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

PCOS患者卵巢颗粒细胞对卵子及早期胚胎发育潜能的基因调控

国家自然科学基金

0+阅读 · 2011年12月31日

Anginex重组腺相关病毒抗血管生成信号转导通路的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于梯度Kriging方法的轮胎花纹形状优化

国家自然科学基金

0+阅读 · 2011年12月31日

一种适用于高维问题的Co-kriging代理模型新方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员