通过经验差异最小化减少政策进步方法的差异 (Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization) - 专知论文

会员服务 ·

0

方差减小 · 方差 · Performer · 可约的 · Better ·

2022 年 6 月 15 日

Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

翻译：通过经验差异最小化减少政策进步方法的差异

Maxim Kaledin,Alexander Golubev,Denis Belomestny

Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in practice but their performance suffers from the high variance of the gradient estimate. Several procedures were proposed to reduce it including actor-critic(AC) and advantage actor-critic(A2C) methods. Recently the approaches have got new perspective due to the introduction of Deep RL: both new control variates(CV) and new sub-sampling procedures became available in the setting of complex models like neural networks. The vital part of CV-based methods is the goal functional for the training of the CV, the most popular one is the least-squares criterion of A2C. Despite its practical success, the criterion is not the only one possible. In this paper we for the first time investigate the performance of the one called Empirical Variance(EV). We observe in the experiments that not only EV-criterion performs not worse than A2C but sometimes can be considerably better. Apart from that, we also prove some theoretical guarantees of the actual variance reduction under very general assumptions and show that A2C least-squares goal functional is an upper bound for EV goal. Our experiments indicate that in terms of variance reduction EV-based methods are much better than A2C and allow stronger variance reduction.

翻译：强化学习(RL)的政策梯度方法非常普遍,并在实践中广泛应用,但其业绩因梯度估计差异很大而受到影响。提出了若干程序来减少这种方法,包括演员-批评(AC)和优劣的演员-批评(A2C)方法。最近,由于采用深REL,这些方法有了新的视角:新的控制变换(CV)和新的子抽样程序在设计像神经网络这样的复杂模型时都有,以CV为基础的方法的关键部分是CV培训的目标功能,最受欢迎的是A2C最差的标准。尽管它取得了实际成功,但标准并非唯一的可能。在本文中,我们首次调查了所谓的Emprical差异(EV)的绩效。我们从实验中看到,不仅EV-C标准的表现不比A2C标准更差,而且有时还可以大大改进。除此之外,我们还证明在理论上保证在一般假设下实际减少差异,并且表明A2C最低值标准是我们降低功能变差目标的上限。

0

相关内容

方差减小

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

SCR-3在雌激素促巨核细胞分化中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

脂肪细胞CD36蛋白对mTOR信号通路的影响：高脂诱导胰岛素抵抗的新机制探讨

国家自然科学基金

0+阅读 · 2013年12月31日

杏仁核-海马CA1区-前额叶皮层环路异常在抑郁发生中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

缺氧/复氧致DNA甲基化协同调控的滋养细胞间质转化（EMT）障碍在子痫前期发生中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

阻断TRPV4受体对脑缺血再灌注损伤的作用及其分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

共振Schottky探针研制

国家自然科学基金

0+阅读 · 2012年12月31日

滤泡辅助性T细胞在多发性硬化发病中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

稀土RE-Fe-Cr三元系相图及其化合物吸波性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Generalized Out-of-Distribution Detection: A Survey

Arxiv

1+阅读 · 2022年8月3日

Robust Learning of Deep Time Series Anomaly Detection Models with Contaminated Training Data

Arxiv

0+阅读 · 2022年8月3日

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Arxiv

0+阅读 · 2022年8月2日

Bias Reduction for Sum Estimation

Arxiv

0+阅读 · 2022年8月2日

Optimizing Graphical Procedures for Multiplicity Control in a Confirmatory Clinical Trial via Deep Learning

Arxiv

0+阅读 · 2022年8月2日

Deep Historical Borrowing Framework to Prospectively and Simultaneously Synthesize Control Information in Confirmatory Clinical Trials with Multiple Endpoints

Arxiv

0+阅读 · 2022年8月2日

Bayesian Active Learning for Sim-to-Real Robotic Perception

Arxiv

0+阅读 · 2022年8月1日

Mixture model for designs in high dimensional regression and the LASSO

Arxiv

0+阅读 · 2022年7月30日

Model Reduction for Nonlinear Systems by Balanced Truncation of State and Gradient Covariance

Arxiv

0+阅读 · 2022年7月28日

Visual Attention Methods in Deep Learning: An In-Depth Survey

Arxiv

45+阅读 · 2022年4月16日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025教程】人类–AI 对齐：基础、方法、实践与挑战

中文版《未来战争：杀伤链优势与俄乌战争启示》报告

中国信通院规划所发布《人工智能算力基础设施赋能研究报告（2025年）》

人机编队将赢得未来战争

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Generalized Out-of-Distribution Detection: A Survey

Arxiv

1+阅读 · 2022年8月3日

Robust Learning of Deep Time Series Anomaly Detection Models with Contaminated Training Data

Arxiv

0+阅读 · 2022年8月3日

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Arxiv

0+阅读 · 2022年8月2日

Bias Reduction for Sum Estimation

Arxiv

0+阅读 · 2022年8月2日

Optimizing Graphical Procedures for Multiplicity Control in a Confirmatory Clinical Trial via Deep Learning

Arxiv

0+阅读 · 2022年8月2日

Deep Historical Borrowing Framework to Prospectively and Simultaneously Synthesize Control Information in Confirmatory Clinical Trials with Multiple Endpoints

Arxiv

0+阅读 · 2022年8月2日

Bayesian Active Learning for Sim-to-Real Robotic Perception

Arxiv

0+阅读 · 2022年8月1日

Mixture model for designs in high dimensional regression and the LASSO

Arxiv

0+阅读 · 2022年7月30日

Model Reduction for Nonlinear Systems by Balanced Truncation of State and Gradient Covariance

Arxiv

0+阅读 · 2022年7月28日

Visual Attention Methods in Deep Learning: An In-Depth Survey

Arxiv

45+阅读 · 2022年4月16日

相关基金

SCR-3在雌激素促巨核细胞分化中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

脂肪细胞CD36蛋白对mTOR信号通路的影响：高脂诱导胰岛素抵抗的新机制探讨

国家自然科学基金

0+阅读 · 2013年12月31日

杏仁核-海马CA1区-前额叶皮层环路异常在抑郁发生中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

缺氧/复氧致DNA甲基化协同调控的滋养细胞间质转化（EMT）障碍在子痫前期发生中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

阻断TRPV4受体对脑缺血再灌注损伤的作用及其分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

共振Schottky探针研制

国家自然科学基金

0+阅读 · 2012年12月31日

滤泡辅助性T细胞在多发性硬化发病中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

稀土RE-Fe-Cr三元系相图及其化合物吸波性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员