Deep reinforcement learning has shown remarkable success in the past few years. Highly complex sequential decision making problems from game playing and robotics have been solved with deep model-free methods. Unfortunately, the sample complexity of model-free methods is often high. To reduce the number of environment samples, model-based reinforcement learning creates an explicit model of the environment dynamics. Achieving high model accuracy is a challenge in high-dimensional problems. In recent years, a diverse landscape of model-based methods has been introduced to improve model accuracy, using methods such as uncertainty modeling, model-predictive control, latent models, and end-to-end learning and planning. Some of these methods succeed in achieving high accuracy at low sample complexity, most do so either in a robotics or in a games context. In this paper, we survey these methods; we explain in detail how they work and what their strengths and weaknesses are. We conclude with a research agenda for future work to make the methods more robust and more widely applicable to other applications.
翻译:过去几年来,深层强化学习取得了显著的成功。游戏和机器人的高度复杂的连续决策问题已经以深层次的无模型方法得到解决。不幸的是,无模型方法的样本复杂性往往很高。为了减少环境样本的数量,基于模型的强化学习创造了一个明确的环境动态模型。实现高模型精确度是高层面问题的挑战。近年来,采用多种模式方法来提高模型准确性,采用的方法包括不确定性建模、模型预测控制、潜伏模型以及端到端学习和规划。其中一些方法在低样本复杂度的情况下成功实现了高精度,多数是在机器人或游戏背景下成功实现的。我们在本文中详细研究这些方法;我们详细解释这些方法是如何运作的,其长处和短处是什么。我们最后研究了未来工作的议程,以使这些方法更加健全,并更广泛地适用于其他应用。