Q- 学习的组合工具限制 (Ensemble Bootstrapping for Q-Learning) - 专知论文

会员服务 ·

0

自助法/自举法 · 有偏 · Extensibility · Performance · Performer ·

2021 年 4 月 20 日

Ensemble Bootstrapping for Q-Learning

翻译：Q- 学习的组合工具限制

Oren Peer,Chen Tessler,Nadav Merlis,Ron Meir

Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may degrade performance. In this work, we introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL), a natural extension of Double-Q-learning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games.

翻译：Q- 学习( QL) 是一种常见的强化学习算法, 它由于最佳贝尔曼操作员的最大化术语而存在高估偏差。这种偏差可能导致亚最佳行为。双Q- 学习通过使用两个估计器来解决这个问题, 却导致低估偏差。类似于Q- 学习中的高估, 在某些情景中, 低估偏差可能会降低性能。在这项工作中, 我们引入了一种新的偏差减算法, 名为 Ensemble Boutstrapped Q- Learning( EBQL), 这是双Q 学习自然扩展至组合的延伸。我们从理论上和实验上分析了我们的方法。从理论上说, 我们证明, 类似 EBQL 的更新在估算一组独立随机变量的最大化时, 产生较低的 MSE 。我们的假设是, 我们显示有些领域在亚最佳性表现。最后, 我们展示了 EBQL 的深 RL 变量优于其他深QL QI 游戏的套A 。

0

相关内容

自助法/自举法

自助法/自举法

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

元学习与图神经网络逻辑推导，55页ppt

元学习与图神经网络逻辑推导，55页ppt

专知会员服务

129+阅读 · 2020年4月25日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年6月9日

Randomized Exploration is Near-Optimal for Tabular MDP

Arxiv

0+阅读 · 2021年6月3日

Task-Guided Inverse Reinforcement Learning Under Partial Information

Arxiv

0+阅读 · 2021年5月28日

On Ensemble Learning

Arxiv

0+阅读 · 2021年3月7日

Deep Graph Structure Learning for Robust Representations: A Survey

Arxiv

21+阅读 · 2021年3月4日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

Arxiv

3+阅读 · 2019年10月17日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

自助法/自举法

相关VIP内容

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

元学习与图神经网络逻辑推导，55页ppt

元学习与图神经网络逻辑推导，55页ppt

专知会员服务

129+阅读 · 2020年4月25日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年6月9日

Randomized Exploration is Near-Optimal for Tabular MDP

Arxiv

0+阅读 · 2021年6月3日

Task-Guided Inverse Reinforcement Learning Under Partial Information

Arxiv

0+阅读 · 2021年5月28日

On Ensemble Learning

Arxiv

0+阅读 · 2021年3月7日

Deep Graph Structure Learning for Robust Representations: A Survey

Arxiv

21+阅读 · 2021年3月4日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

Arxiv

3+阅读 · 2019年10月17日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员