政策优化的近似梯度梯度更新参数级 (A Parametric Class of Approximate Gradient Updates for Policy Optimization) - 专知论文

会员服务 ·

0

优化器 · motivation · 可辨认的 · 近似 · 期望回报 ·

2022 年 6 月 17 日

A Parametric Class of Approximate Gradient Updates for Policy Optimization

翻译：政策优化的近似梯度梯度更新参数级

Ramki Gummadi,Saurabh Kumar,Junfeng Wen,Dale Schuurmans

Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted (e.g. value versus policy representation) or how the learning objective is formulated, yet they share a common goal of maximizing expected return. To better capture the commonalities and identify key differences between policy optimization methods, we develop a unified perspective that re-expresses the underlying updates in terms of a limited choice of gradient form and scaling function. In particular, we identify a parameterized space of approximate gradient updates for policy optimization that is highly structured, yet covers both classical and recent examples, including PPO. As a result, we obtain novel yet well motivated updates that generalize existing algorithms in a way that can deliver benefits both in terms of convergence speed and final result quality. An experimental investigation demonstrates that the additional degrees of freedom provided in the parameterized family of updates can be leveraged to obtain non-trivial improvements both in synthetic domains and on popular deep RL benchmarks.

翻译：根据对参数模型的解释(例如价值相对于政策代表性)或如何制定学习目标,政策优化方法的动机来自不同的原则,基于对参数模型的解释(例如,价值相对于政策代表性)或如何制定学习目标,然而,它们有一个共同的共同目标,即最大限度地实现预期回报。为了更好地捕捉共同点,并找出政策优化方法之间的关键差异,我们制定了统一的观点,从有限的梯度形式选择和比例计算功能的角度重新表达基本更新内容。特别是,我们确定了政策优化的粗略梯度更新的参数空间,该空间结构性很强,但涵盖了传统和近期的实例,包括PPPO。结果,我们获得了新颖但动机良好的更新,将现有算法概括化,从而在趋同速度和最终结果质量两方面都能够带来效益。一项试验性调查表明,在参数化的更新大家庭中提供的更多程度的自由可以被利用,以便在合成领域和流行的深RL基准上获得非三重改进。

0

相关内容

优化器

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

太平洋牡蛎IL17R/ACT1信号通路鉴定及其在天然免疫中的功能

国家自然科学基金

0+阅读 · 2015年12月31日

非小细胞肺癌患者血浆可溶性TRAIL对循环ALDH1+肿瘤干细胞样细胞的影响及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

弱辛Banach空间上的Maslov指标的研究

国家自然科学基金

0+阅读 · 2014年12月31日

TRAIL/死亡受体信号调节凋亡在肢体远程缺血预处理抗肠缺血再灌注损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

寡糖的提取分离和结构分析的方法学研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于TLR7/ MyD88/ TRAF-6通路和代谢组学的寒热并用法治疗H1N1病毒性肺损伤机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Arxiv

0+阅读 · 2022年8月5日

Neighborhood Collective Estimation for Noisy Label Identification and Correction

Arxiv

0+阅读 · 2022年8月5日

Efficiently Generating Independent Samples Directly from the Posterior Distribution for a Large Class of Bayesian Generalized Linear Mixed Effects Models

Arxiv

0+阅读 · 2022年8月4日

Unconventional application of k-means for distributed approximate similarity search

Unconventional application of k-means for distributed approximate similarity search

Arxiv

0+阅读 · 2022年8月4日

Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization

Arxiv

0+阅读 · 2022年8月4日

Stochastic Gradient Line Bayesian Optimization for Efficient Noise-Robust Optimization of Parameterized Quantum Circuits

Arxiv

0+阅读 · 2022年8月3日

A Multi-Dimensional Matrix Pencil-Based Channel Prediction Method for Massive MIMO with Mobility

Arxiv

0+阅读 · 2022年8月3日

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Arxiv

0+阅读 · 2022年8月2日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Arxiv

0+阅读 · 2022年8月5日

Neighborhood Collective Estimation for Noisy Label Identification and Correction

Arxiv

0+阅读 · 2022年8月5日

Efficiently Generating Independent Samples Directly from the Posterior Distribution for a Large Class of Bayesian Generalized Linear Mixed Effects Models

Arxiv

0+阅读 · 2022年8月4日

Unconventional application of k-means for distributed approximate similarity search

Unconventional application of k-means for distributed approximate similarity search

Arxiv

0+阅读 · 2022年8月4日

Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization

Arxiv

0+阅读 · 2022年8月4日

Stochastic Gradient Line Bayesian Optimization for Efficient Noise-Robust Optimization of Parameterized Quantum Circuits

Arxiv

0+阅读 · 2022年8月3日

A Multi-Dimensional Matrix Pencil-Based Channel Prediction Method for Massive MIMO with Mobility

Arxiv

0+阅读 · 2022年8月3日

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Arxiv

0+阅读 · 2022年8月2日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Attention-based Ensemble for Deep Metric Learning

Arxiv

17+阅读 · 2018年4月2日

相关基金

太平洋牡蛎IL17R/ACT1信号通路鉴定及其在天然免疫中的功能

国家自然科学基金

0+阅读 · 2015年12月31日

非小细胞肺癌患者血浆可溶性TRAIL对循环ALDH1+肿瘤干细胞样细胞的影响及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

弱辛Banach空间上的Maslov指标的研究

国家自然科学基金

0+阅读 · 2014年12月31日

TRAIL/死亡受体信号调节凋亡在肢体远程缺血预处理抗肠缺血再灌注损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

寡糖的提取分离和结构分析的方法学研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于TLR7/ MyD88/ TRAF-6通路和代谢组学的寒热并用法治疗H1N1病毒性肺损伤机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员