Maxmin Maxmin Q-学习:控制Q-学习的估算比值 (Maxmin Q-learning: Controlling the Estimation Bias of Q-learning) - 专知论文

会员服务 ·

0

估计/估计量 · 有偏 · 过估计 · 控制器 · 可约的 ·

2021 年 8 月 7 日

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

翻译：Maxmin Maxmin Q-学习:控制Q-学习的估算比值

Qingfeng Lan,Yangchen Pan,Alona Fyshe,Martha White

from arxiv, ICLR 2020

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias interacts with performance, and the extent to which existing algorithms mitigate bias. In this paper, we 1) highlight that the effect of overestimation bias on learning efficiency is environment-dependent; 2) propose a generalization of Q-learning, called \emph{Maxmin Q-learning}, which provides a parameter to flexibly control bias; 3) show theoretically that there exists a parameter choice for Maxmin Q-learning that leads to unbiased estimation with a lower approximation variance than Q-learning; and 4) prove the convergence of our algorithm in the tabular case, as well as convergence of several previous Q-learning variants, using a novel Generalized Q-learning framework. We empirically verify that our algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.

翻译：Q- 学习存在高估偏差, 因为它使用最大估计动作价值来接近最大行动价值。已经提出了计算法, 以减少高估偏差, 但我们不了解偏差如何与业绩相互作用, 以及现有算法在多大程度上减轻偏差。在本文中, 我们强调高估偏差对学习效率的影响取决于环境; 2) 提议对Q- 学习进行概括化, 称为 emph{ Maxmin Q- learning}, 提供灵活控制偏差的参数; 3) 理论上表明, Maxmin Q 学习有一个参数选择, 导致不偏袒的估计, 其近似差异小于 Q- 学习; 4) 证明我们算法在表案上的趋同, 以及以前几个Q- 学习变体的趋同, 使用新的通用的Q- 学习框架。我们从经验上证实, 我们的算法更好地控制了对致命环境中的偏差的估计, 并且它在若干基准问题上取得了优的业绩。

0

相关内容

估计/估计量

估计/估计量

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

专知会员服务

38+阅读 · 2020年6月3日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年10月7日

Nested Policy Reinforcement Learning

Nested Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年10月6日

Learning to Maximize Influence

Arxiv

0+阅读 · 2021年10月6日

Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

Arxiv

0+阅读 · 2021年10月5日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Arxiv

8+阅读 · 2018年7月10日

A Tour of Reinforcement Learning: The View from Continuous Control

Arxiv

6+阅读 · 2018年6月25日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

Arxiv

6+阅读 · 2018年4月23日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

专知会员服务

38+阅读 · 2020年6月3日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

卫星导航技术发展综述

《美军"僚机"联合能力技术演示项目：有人-无人火炮作战》41页报告

美军条令《火力指挥》116页

可解释的人工智能在生物医学图像分析中的应用综述

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年10月7日

Nested Policy Reinforcement Learning

Nested Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年10月6日

Learning to Maximize Influence

Arxiv

0+阅读 · 2021年10月6日

Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

Arxiv

0+阅读 · 2021年10月5日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Arxiv

8+阅读 · 2018年7月10日

A Tour of Reinforcement Learning: The View from Continuous Control

Arxiv

6+阅读 · 2018年6月25日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

Arxiv

6+阅读 · 2018年4月23日

微信扫码咨询专知VIP会员