示范强化学习的游戏理论框架 (A Game Theoretic Framework for Model Based Reinforcement Learning)

Model-based reinforcement learning (MBRL) has recently gained immense interest due to its potential for sample efficiency and ability to incorporate off-policy data. However, designing stable and efficient MBRL algorithms using rich function approximators have remained challenging. To help expose the practical challenges in MBRL and simplify algorithm design from the lens of abstraction, we develop a new framework that casts MBRL as a game between: (1) a policy player, which attempts to maximize rewards under the learned model; (2) a model player, which attempts to fit the real-world data collected by the policy player. For algorithm development, we construct a Stackelberg game between the two players, and show that it can be solved with approximate bi-level optimization. This gives rise to two natural families of algorithms for MBRL based on which player is chosen as the leader in the Stackelberg game. Together, they encapsulate, unify, and generalize many previous MBRL algorithms. Furthermore, our framework is consistent with and provides a clear basis for heuristics known to be important in practice from prior works. Finally, through experiments we validate that our proposed algorithms are highly sample efficient, match the asymptotic performance of model-free policy gradient, and scale gracefully to high-dimensional tasks like dexterous hand manipulation. Additional details and code can be obtained from the project page at https://sites.google.com/view/mbrl-game

翻译：基于模型的加固学习(MBRL)最近由于具有样本效率和纳入离政策数据的能力的潜力而获得了巨大的兴趣。然而,利用丰富的功能相配者设计稳定高效的 MBRL 算法仍然具有挑战性。为了帮助暴露MBRL中的实际挑战,并从抽象角度简化算法设计,我们开发了一个新的框架,将MBRL作为一个游戏,在以下两个游戏中将MBRL作为一个游戏:(1) 一个政策玩家,该玩家试图根据所学的模型最大限度地获得收益;(2) 一个模型玩家,该玩家试图与政策玩家收集的真实的游戏相匹配。关于算法的开发,我们建造了两个玩家之间的Stackelberg游戏,并展示了它可以通过大约双级优化解决它。为了帮助暴露MBRBRL的算法的两种自然的算法组合,根据这个算法选择了玩家作为Stackelberg游戏的领导者。一起,它们概括、统一、概括和概括了许多以前MBRL的算法。此外,我们的框架与人们所知道在前工作实践中很重要的超度数据相匹配的基础。最后,我们通过实验来验证我们所提议的高度和高度的模型,我们所拟议的高度的递定式的平级的算法是高度的精度和高度的精度的精度的精度的精度和高度的精度的精度的精度的精度的精度的精度的精度操作制。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日