重新审查基于模型的离线强化学习中的设计选择 (Revisiting Design Choices in Model-Based Offline Reinforcement Learning)

Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies, circumventing the need for potentially expensive or unsafe online data collection. Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model. This typically involves constructing a probabilistic model, and using the model uncertainty to penalize rewards where there is insufficient data, solving for a pessimistic MDP that lower bounds the true MDP. Existing methods, however, exhibit a breakdown between theory and practice, whereby pessimistic return ought to be bounded by the total variation distance of the model from the true dynamics, but is instead implemented through a penalty based on estimated model uncertainty. This has spawned a variety of uncertainty heuristics, with little to no comparison between differing approaches. In this paper, we compare these heuristics, and design novel protocols to investigate their interaction with other hyperparameters, such as the number of models, or imaginary rollout horizon. Using these insights, we show that selecting these key hyperparameters using Bayesian Optimization produces superior configurations that are vastly different to those currently used in existing hand-tuned state-of-the-art methods, and result in drastically stronger performance.

翻译：离线强化学习使代理商能够利用大量预先收集的环境转型数据集学习控制政策,从而避免潜在昂贵或不安全的在线数据收集需求。最近,在离线的基于模型的强化强化学习方面取得了显著进展,这些方法利用了学习的动态模型模式。这通常涉及构建一种概率模型,并利用模型不确定性来惩罚数据不足时的奖励,解决悲观的MDP,从而降低真正的MDP的界限。但是,现有的方法在理论和实践之间显示出一种分解,即悲观的回归应该受模型与真实动态之间总变异距离的束缚,而相反,则通过基于估计模型不确定性的处罚加以实施。这催生了多种不确定性的超常态,而不同方法之间几乎没有任何比较。在本文中,我们比较这些超常现象,并设计新的协议来调查它们与其他超常度计(如模型的数量,或想象的展出地平线)的相互作用。我们通过这些洞察显示,使用Bayesian Opitimization(Bayest) 选择这些关键超直径仪,而在目前所使用的最强的状态中产生更强的性结果。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日