强力强化强化学习,以持续控制,并有典型的区分错误 (Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification)

Many real-world physical control systems are required to satisfy constraints upon deployment. Furthermore, real-world systems are often subject to effects such as non-stationarity, wear-and-tear, uncalibrated sensors and so on. Such effects effectively perturb the system dynamics and can cause a policy trained successfully in one domain to perform poorly when deployed to a perturbed version of the same domain. This can affect a policy's ability to maximize future rewards as well as the extent to which it satisfies constraints. We refer to this as constrained model misspecification. We present an algorithm with theoretical guarantees that mitigates this form of misspecification, and showcase its performance in multiple Mujoco tasks from the Real World Reinforcement Learning (RWRL) suite.

翻译：此外,现实世界系统往往受到非静止、磨损、未经校准的传感器等效应的影响。这些效应实际上干扰了系统动态,并可能导致一个领域受过培训的政策在被安装到同一领域受扰动的版本时表现不佳。这可能会影响政策在未来获得最大回报的能力及其满足制约的程度。我们将此称为限制模式区分错误。我们提出了一个具有理论保证的算法,可以减少这种定型形式,并在真实世界强化学习套件的多个Mujoco任务中展示其表现。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日