AWD3: 动态减少估计比值 (AWD3: Dynamic Reduction of the Estimation Bias) - 专知论文

会员服务 ·

0

估计/估计量 · 有偏 · Continuity · 学成 · Weight ·

2021 年 11 月 12 日

AWD3: Dynamic Reduction of the Estimation Bias

翻译：AWD3: 动态减少估计比值

Dogan C. Cicek,Enes Duran,Baturay Saglam,Kagan Kaya,Furkan B. Mutlu,Suleyman S. Kozat

from arxiv, Accepted at The 33rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2021)

Value-based deep Reinforcement Learning (RL) algorithms suffer from the estimation bias primarily caused by function approximation and temporal difference (TD) learning. This problem induces faulty state-action value estimates and therefore harms the performance and robustness of the learning algorithms. Although several techniques were proposed to tackle, learning algorithms still suffer from this bias. Here, we introduce a technique that eliminates the estimation bias in off-policy continuous control algorithms using the experience replay mechanism. We adaptively learn the weighting hyper-parameter beta in the Weighted Twin Delayed Deep Deterministic Policy Gradient algorithm. Our method is named Adaptive-WD3 (AWD3). We show through continuous control environments of OpenAI gym that our algorithm matches or outperforms the state-of-the-art off-policy policy gradient learning algorithms.

翻译：基于价值的深强化学习算法受到主要由功能近似值和时间差异(TD)学习引起的估计偏差的偏差。这个问题导致州- 行动价值估计有误, 从而损害学习算法的性能和稳健性。虽然提出了几种方法要解决, 学习算法仍然受到这种偏差的影响。在这里, 我们引入一种技术, 利用经验重放机制消除非政策性连续控制算法的估计偏差。我们适应性地学习了“ 双重重重延迟的深层确定性政策梯度算法” 中的超参数贝。我们的方法叫做适应- WD3 (AWD3)。我们通过OpenAI 健身房的持续控制环境显示, 我们的算法匹配或优于最先进的离政策梯度学习算法。

0

相关内容

估计/估计量

估计/估计量

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

On the Estimation Bias in Double Q-Learning

On the Estimation Bias in Double Q-Learning

Arxiv

0+阅读 · 2022年1月14日

Error estimation for the time to a threshold value in evolutionary partial differential equations

Arxiv

0+阅读 · 2022年1月14日

Nested sampling for frequentist computation: fast estimation of small $p$-values

Arxiv

0+阅读 · 2022年1月14日

Fully Adaptive Bayesian Algorithm for Data Analysis, FABADA

Arxiv

0+阅读 · 2022年1月13日

Assessing the Performance of Diagnostic Classification Models in Small Sample Contexts with Different Estimation Methods

Assessing the Performance of Diagnostic Classification Models in Small Sample Contexts with Different Estimation Methods

Arxiv

0+阅读 · 2022年1月13日

Approximate solutions of convex semi-infinite optimization problems in finitely many iterations

Arxiv

0+阅读 · 2022年1月13日

Combining Interventional and Observational Data Using Causal Reductions

Arxiv

0+阅读 · 2022年1月12日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】移动计算摄影的神经场表示

大语言模型遇见法律人工智能：综述

【ICCV2025】InfGen：一种分辨率无关的可扩展图像合成范式

美军用无人地面战车发展：现代战争中超越弹药的多元应用

相关资讯

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

On the Estimation Bias in Double Q-Learning

On the Estimation Bias in Double Q-Learning

Arxiv

0+阅读 · 2022年1月14日

Error estimation for the time to a threshold value in evolutionary partial differential equations

Arxiv

0+阅读 · 2022年1月14日

Nested sampling for frequentist computation: fast estimation of small $p$-values

Arxiv

0+阅读 · 2022年1月14日

Fully Adaptive Bayesian Algorithm for Data Analysis, FABADA

Arxiv

0+阅读 · 2022年1月13日

Assessing the Performance of Diagnostic Classification Models in Small Sample Contexts with Different Estimation Methods

Assessing the Performance of Diagnostic Classification Models in Small Sample Contexts with Different Estimation Methods

Arxiv

0+阅读 · 2022年1月13日

Approximate solutions of convex semi-infinite optimization problems in finitely many iterations

Arxiv

0+阅读 · 2022年1月13日

Combining Interventional and Observational Data Using Causal Reductions

Arxiv

0+阅读 · 2022年1月12日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员