持续加强学习高估比值自动控制 (Automating Control of Overestimation Bias for Continuous Reinforcement Learning)

Bias correction techniques are used by most of the high-performing methods for off-policy reinforcement learning. However, these techniques rely on a pre-defined bias correction policy that is either not flexible enough or requires environment-specific tuning of hyperparameters. In this work, we present a simple data-driven approach for guiding bias correction. We demonstrate its effectiveness on the Truncated Quantile Critics -- a state-of-the-art continuous control algorithm. The proposed technique can adjust the bias correction across environments automatically. As a result, it eliminates the need for an extensive hyperparameter search, significantly reducing the actual number of interactions and computation.

翻译：多数高性能的非政策强化学习方法都使用比亚纠正技术,但是,这些技术依赖于预先确定的偏差纠正政策,该政策不是不够灵活,就是要求按环境对超参数进行调整。在这项工作中,我们提出了指导偏差纠正的简单数据驱动方法。我们展示了它对于脱节的量子弧式(Qantile Critics)的有效性,这是一种最先进的连续控制算法。拟议的技术可以自动调整跨环境的偏差纠正。因此,它消除了大规模超光度搜索的需要,大大减少了互动和计算的实际数量。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日