Model-based Reinforcement Learning (RL) is a popular learning paradigm due to its potential sample efficiency compared to model-free RL. However, existing empirical model-based RL approaches lack the ability to explore. This work studies a computationally and statistically efficient model-based algorithm for both Kernelized Nonlinear Regulators (KNR) and linear Markov Decision Processes (MDPs). For both models, our algorithm guarantees polynomial sample complexity and only uses access to a planning oracle. Experimentally, we first demonstrate the flexibility and efficacy of our algorithm on a set of exploration challenging control tasks where existing empirical model-based RL approaches completely fail. We then show that our approach retains excellent performance even in common dense reward control benchmarks that do not require heavy exploration. Finally, we demonstrate that our method can also perform reward-free exploration efficiently. Our code can be found at https://github.com/yudasong/PCMLP.
翻译:以模型为基础的强化学习(RL)是一种流行的学习模式,因为与无模型的RL相比,它具有潜在的样本效率。然而,现有的以经验为基础的基于模型的RL方法缺乏探索能力。这项工作为内核非线性监管者和线性Markov决策程序(MDPs)研究一种基于计算和统计效率的基于模型的算法。对于这两种模式,我们的算法保证了多元样本的复杂性,并且只使用规划或触角。实验性地说,我们首先展示了在一系列具有挑战性的勘探任务上我们算法的灵活性和效力,而现有的以经验为基础的基于模型的RL方法完全失败了。我们然后表明,我们的方法即使在不需要大量探索的共同密集的奖赏控制基准中也保持了出色的业绩。最后,我们证明我们的方法也可以有效地进行无报酬的探索。我们的代码可以在 https://github.com/yudasong/PCMLP上找到。