高多层次问题深层示范强化学习,调查 (Deep Model-Based Reinforcement Learning for High-Dimensional Problems, a Survey)

Deep reinforcement learning has shown remarkable success in the past few years. Highly complex sequential decision making problems have been solved in tasks such as game playing and robotics. Unfortunately, the sample complexity of most deep reinforcement learning methods is high, precluding their use in some important applications. Model-based reinforcement learning creates an explicit model of the environment dynamics to reduce the need for environment samples. Current deep learning methods use high-capacity networks to solve high-dimensional problems. Unfortunately, high-capacity models typically require many samples, negating the potential benefit of lower sample complexity in model-based methods. A challenge for deep model-based methods is therefore to achieve high predictive power while maintaining low sample complexity. In recent years, many model-based methods have been introduced to address this challenge. In this paper, we survey the contemporary model-based landscape. First we discuss definitions and relations to other fields. We propose a taxonomy based on three approaches: using explicit planning on given transitions, using explicit planning on learned transitions, and end-to-end learning of both planning and transitions. We use these approaches to organize a comprehensive overview of important recent developments such as latent models. We describe methods and benchmarks, and we suggest directions for future work for each of the approaches. Among promising research directions are curriculum learning, uncertainty modeling, and use of latent models for transfer learning.

翻译：深层强化学习在过去几年中表现出了显著的成功。在游戏游戏和机器人等任务方面,已经解决了高度复杂的连续决策问题。不幸的是,最深层强化学习方法的抽样复杂性很高,无法在一些重要应用中使用。基于模型的强化学习为减少环境样品的需求创造了一个明确的环境动态模型。目前深层学习方法使用高能力网络来解决高维问题。不幸的是,高能力模型通常需要许多样本,否定了基于模型的方法中较低样本复杂性的潜在好处。因此,深层基于模型的方法面临的挑战是取得高预测力,同时保持低样本复杂性。近年来,许多基于模型的方法已被采用来应对这一挑战。在本文件中,我们调查当代基于模型的景观。首先我们讨论定义和与其他领域的关系。我们建议基于三种方法进行分类:使用对特定过渡的明确规划,利用对已学习的过渡的明确规划,以及从最后到最后的规划和过渡方法。我们使用这些方法来全面概述近期的重要发展动态,如潜在模型。我们用许多基于模型的方法方法来应对这一挑战。我们调查当代基于模型的形势。我们首先讨论定义和与其他领域的关系。我们建议了一种有前途的学习方向。