使用插件溶解器的线性混合像素 MDP 近最佳回收-自由勘探 (Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver)

Although model-based reinforcement learning (RL) approaches are considered more sample efficient, existing algorithms are usually relying on sophisticated planning algorithm to couple tightly with the model-learning procedure. Hence the learned models may lack the ability of being re-used with more specialized planners. In this paper we address this issue and provide approaches to learn an RL model efficiently without the guidance of a reward signal. In particular, we take a plug-in solver approach, where we focus on learning a model in the exploration phase and demand that \emph{any planning algorithm} on the learned model can give a near-optimal policy. Specicially, we focus on the linear mixture MDP setting, where the probability transition matrix is a (unknown) convex combination of a set of existing models. We show that, by establishing a novel exploration algorithm, the plug-in approach learns a model by taking $\tilde{O}(d^2H^3/\epsilon^2)$ interactions with the environment and \emph{any} $\epsilon$-optimal planner on the model gives an $O(\epsilon)$-optimal policy on the original model. This sample complexity matches lower bounds for non-plug-in approaches and is \emph{statistically optimal}. We achieve this result by leveraging a careful maximum total-variance bound using Bernstein inequality and properties specified to linear mixture MDP.

翻译：虽然基于模型的强化学习(RL)方法被认为更有效率,但现有的算法通常依赖复杂的规划算法,与模型学习程序紧密结合。因此,学习的模型可能缺乏与更专业化的规划者重新使用的能力。在本文件中,我们处理这一问题,并提供在没有奖赏信号的指导下有效学习RL模型的方法。特别是,我们采取插接解算法,我们侧重于在探索阶段学习模型,并要求在所学模型上采用\emph{任何规划算法}能够提供接近最佳的政策。我们注重线性混合 MDP 设置,其中概率转换矩阵是一套现有模型的(未知的)组合组合。我们表明,通过建立新的探索算法,插接通方法学习模型,我们采用 $\ text{O}(d2H%3/\epsilon%2) 来学习一个模型,同时要求与环境和\emph{ny{any} (eqourlon-op$-optimener plan) 能够产生一个(未知的)原始的(O\ep) comestimalimalimal imal assimal 方法。我们用这个原始的、不拘谨的模型来取得一个最精度的精细的模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日