双学制的估算比值 (On the Estimation Bias in Double Q-Learning) - 专知论文

会员服务 ·

0

欠估计 · 有偏 · 估计/估计量 · Extensibility · 可约的 ·

2021 年 12 月 16 日

On the Estimation Bias in Double Q-Learning

翻译：双学制的估算比值

Zhizhou Ren,Guangxiang Zhu,Hao Hu,Beining Han,Jianglun Chen,Chongjie Zhang

from arxiv, Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing reliable value prediction and improving learning performance. However, as shown by prior work, double Q-learning is not fully unbiased and suffers from underestimation bias. In this paper, we show that such underestimation bias may lead to multiple non-optimal fixed points under an approximate Bellman operator. To address the concerns of converging to non-optimal stationary solutions, we propose a simple but effective approach as a partial fix for the underestimation bias in double Q-learning. This approach leverages an approximate dynamic programming to bound the target value. We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms.

翻译：双Q学习是减少高估偏差的典型方法,其原因是在贝尔曼行动中采用了最高估计值,其深Q学习模式的变异在提供可靠的价值预测和改善学习业绩方面显示出很大的希望,然而,如以往工作所示,双Q学习并不完全公正,而且有低估偏差。在本文中,我们表明这种低估偏差可能导致在接近贝尔曼的操作者的领导下出现多种非最佳固定点。为了解决对非最佳固定解决办法的趋同问题,我们提议一种简单而有效的方法,作为双重Q学习中低估偏差的部分固定方法。这种方法利用一种大致动态的方案拟订方法来约束目标价值。我们广泛评价了我们拟议的阿塔里基准任务方法,并表明其相对于基线算法的重大改进。

0

相关内容

欠估计

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

293+阅读 · 2020年3月10日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

162+阅读 · 2019年10月12日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Machine Learning：十大机器学习算法

Machine Learning：十大机器学习算法

开源中国

21+阅读 · 2018年3月1日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Regularized Q-learning

Arxiv

0+阅读 · 2022年2月21日

A bandit-learning approach to multifidelity approximation

Arxiv

0+阅读 · 2022年2月21日

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

Arxiv

0+阅读 · 2022年2月20日

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

Arxiv

0+阅读 · 2022年2月19日

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Arxiv

0+阅读 · 2022年2月19日

On Variance Estimation of Random Forests

On Variance Estimation of Random Forests

Arxiv

0+阅读 · 2022年2月18日

Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models

Arxiv

0+阅读 · 2022年2月17日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

293+阅读 · 2020年3月10日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

162+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

《用于提升多域战备的大型语言模型辅助场景生成器》报告

《通过适应复杂环境与特种作战动态变革情报周期》报告

国防领域人工智能规模化应用的理论与实践

《多域作战背景下集体防御作战规划流程的建模与仿真、兵棋推演及人工智能方法》

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Machine Learning：十大机器学习算法

Machine Learning：十大机器学习算法

开源中国

21+阅读 · 2018年3月1日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Regularized Q-learning

Arxiv

0+阅读 · 2022年2月21日

A bandit-learning approach to multifidelity approximation

Arxiv

0+阅读 · 2022年2月21日

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

Arxiv

0+阅读 · 2022年2月20日

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

Arxiv

0+阅读 · 2022年2月19日

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Arxiv

0+阅读 · 2022年2月19日

On Variance Estimation of Random Forests

On Variance Estimation of Random Forests

Arxiv

0+阅读 · 2022年2月18日

Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models

Arxiv

0+阅读 · 2022年2月17日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员