初步政策优化的局部优势估计值 (Partial advantage estimator for proximal policy optimization) - 专知论文

会员服务 ·

0

估计/估计量 · 可约的 · 有偏 · 优化器 · Performer ·

2023 年 1 月 26 日

Partial advantage estimator for proximal policy optimization

翻译：初步政策优化的局部优势估计值

Xiulei Song,Yizhao Jin,Greg Slabaugh,Simon Lucas

Estimation of value in policy gradient methods is a fundamental problem. Generalized Advantage Estimation (GAE) is an exponentially-weighted estimator of an advantage function similar to $\lambda$-return. It substantially reduces the variance of policy gradient estimates at the expense of bias. In practical applications, a truncated GAE is used due to the incompleteness of the trajectory, which results in a large bias during estimation. To address this challenge, instead of using the entire truncated GAE, we propose to take a part of it when calculating updates, which significantly reduces the bias resulting from the incomplete trajectory. We perform experiments in MuJoCo and $\mu$RTS to investigate the effect of different partial coefficient and sampling lengths. We show that our partial GAE approach yields better empirical results in both environments.

翻译：政策梯度方法的价值估计是一个根本问题。普遍优惠估计(GAE)是一个指数加权的优势函数估计值,类似于 $\ lambda$- return,它大大降低了政策梯度估计值的差异,但以偏差为代价。在实际应用中,由于轨迹不全,导致估算过程中存在很大的偏差,因此使用了短差的GAE。为了应对这一挑战,我们提议在计算更新时采用其中的一部分,以大幅降低不完全轨迹造成的偏差。我们在MuJoCo和$\mu$RTS进行实验,以调查不同部分系数和抽样长度的影响。我们表明,我们部分的GAE方法在两种环境中都取得了更好的经验结果。

0

相关内容

估计/估计量

估计/估计量

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

黎曼流形上 Ricci 曲率的几何

国家自然科学基金

3+阅读 · 2015年12月31日

Neuregulin-1/ErbB信号传导系统在缺血性心脏病心肌血管重构中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

redox信号介导的6-BA调控黄瓜弱光适应性的生理与分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

钛合金锻造成形时的微观组织敏感性模型与调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

某些重金属的核酸适配体非标记纳米粒子-SERS光谱分析

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

HIF-1αAPK通路在微波辐射致海马线粒体损伤中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

A Policy Iteration Approach for Flock Motion Control

Arxiv

0+阅读 · 2023年3月17日

$L^p$-resolvent estimate for finite element approximation of the Stokes operator

Arxiv

0+阅读 · 2023年3月17日

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Arxiv

1+阅读 · 2023年3月16日

Sequential Gaussian Processes for Online Learning of Nonstationary Functions

Arxiv

0+阅读 · 2023年3月16日

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月16日

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月15日

Real-Time Measurement-Driven Reinforcement Learning Control Approach for Uncertain Nonlinear Systems

Arxiv

0+阅读 · 2023年3月15日

Robust online active learning

Arxiv

0+阅读 · 2023年3月15日

Efficient Compressed Ratio Estimation using Online Sequential Learning for Edge Computing

Arxiv

0+阅读 · 2023年3月15日

Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality

Arxiv

0+阅读 · 2023年3月15日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Policy Iteration Approach for Flock Motion Control

Arxiv

0+阅读 · 2023年3月17日

$L^p$-resolvent estimate for finite element approximation of the Stokes operator

Arxiv

0+阅读 · 2023年3月17日

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Arxiv

1+阅读 · 2023年3月16日

Sequential Gaussian Processes for Online Learning of Nonstationary Functions

Arxiv

0+阅读 · 2023年3月16日

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月16日

Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月15日

Real-Time Measurement-Driven Reinforcement Learning Control Approach for Uncertain Nonlinear Systems

Arxiv

0+阅读 · 2023年3月15日

Robust online active learning

Arxiv

0+阅读 · 2023年3月15日

Efficient Compressed Ratio Estimation using Online Sequential Learning for Edge Computing

Arxiv

0+阅读 · 2023年3月15日

Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality

Arxiv

0+阅读 · 2023年3月15日

相关基金

黎曼流形上 Ricci 曲率的几何

国家自然科学基金

3+阅读 · 2015年12月31日

Neuregulin-1/ErbB信号传导系统在缺血性心脏病心肌血管重构中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

redox信号介导的6-BA调控黄瓜弱光适应性的生理与分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

钛合金锻造成形时的微观组织敏感性模型与调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

某些重金属的核酸适配体非标记纳米粒子-SERS光谱分析

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

HIF-1αAPK通路在微波辐射致海马线粒体损伤中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员