双学制的估算比值 (On the Estimation Bias in Double Q-Learning) - 专知论文

会员服务 ·

0

欠估计 · 有偏 · 估计/估计量 · Extensibility · 可约的 ·

2021 年 9 月 29 日

On the Estimation Bias in Double Q-Learning

翻译：双学制的估算比值

Zhizhou Ren,Guangxiang Zhu,Hao Hu,Beining Han,Jianglun Chen,Chongjie Zhang

from arxiv, Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing reliable value prediction and improving learning performance. However, as shown by prior work, double Q-learning is not fully unbiased and suffers from underestimation bias. In this paper, we show that such underestimation bias may lead to multiple non-optimal fixed points under an approximated Bellman operator. To address the concerns of converging to non-optimal stationary solutions, we propose a simple but effective approach as a partial fix for the underestimation bias in double Q-learning. This approach leverages an approximate dynamic programming to bound the target value. We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms.

翻译：双Q学习是减少高估偏差的经典方法,其原因是在贝尔曼行动中采用了最高估计值,其深Q学习模式的变异在提供可靠的价值预测和改善学习业绩方面显示出很大的希望,然而,如以往工作所示,双Q学习并不完全公正,而且有低估偏差。在本文中,我们表明这种低估偏差可能导致一个接近贝尔曼操作员的多处非最佳固定点。为了解决对非最佳固定解决办法的趋同,我们建议一种简单而有效的办法,作为双重Q学习中低估偏差的部分解决办法。这种方法利用一种大致动态的方案拟订办法来约束目标价值。我们广泛评价了我们在阿塔里基准任务中的拟议方法,并表明其相对于基线算法的重大改进。

0

相关内容

欠估计

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

专知会员服务

27+阅读 · 2020年4月3日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

282+阅读 · 2019年10月9日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Off-Policy Correction For Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年11月22日

Converting ADMM to a Proximal Gradient for Efficient Sparse Estimation

Arxiv

0+阅读 · 2021年11月22日

Deep Probability Estimation

Arxiv

0+阅读 · 2021年11月21日

Resilient Consensus-based Multi-agent Reinforcement Learning with Function Approximation

Resilient Consensus-based Multi-agent Reinforcement Learning with Function Approximation

Arxiv

0+阅读 · 2021年11月18日

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年11月17日

Deep Learning-Based Human Pose Estimation: A Survey

Arxiv

27+阅读 · 2020年12月24日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Feasibility Based Large Margin Nearest Neighbor Metric Learning

Arxiv

3+阅读 · 2018年5月2日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

Positive-Unlabeled Learning with Non-Negative Risk Estimator

Arxiv

3+阅读 · 2017年11月4日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

专知会员服务

27+阅读 · 2020年4月3日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

282+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Off-Policy Correction For Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年11月22日

Converting ADMM to a Proximal Gradient for Efficient Sparse Estimation

Arxiv

0+阅读 · 2021年11月22日

Deep Probability Estimation

Arxiv

0+阅读 · 2021年11月21日

Resilient Consensus-based Multi-agent Reinforcement Learning with Function Approximation

Resilient Consensus-based Multi-agent Reinforcement Learning with Function Approximation

Arxiv

0+阅读 · 2021年11月18日

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年11月17日

Deep Learning-Based Human Pose Estimation: A Survey

Arxiv

27+阅读 · 2020年12月24日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Feasibility Based Large Margin Nearest Neighbor Metric Learning

Arxiv

3+阅读 · 2018年5月2日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

Positive-Unlabeled Learning with Non-Negative Risk Estimator

Arxiv

3+阅读 · 2017年11月4日

微信扫码咨询专知VIP会员