Deep Reinforcement Learning (Deep RL) has been in the spotlight for the past few years, due to its remarkable abilities to solve problems which were considered to be practically unsolvable using traditional Machine Learning methods. However, even state-of-the-art Deep RL algorithms have various weaknesses that prevent them from being used extensively within industry applications, with one such major weakness being their sample-inefficiency. In an effort to patch these issues, we integrated a meta-learning technique in order to shift the objective of learning to solve a task into the objective of learning how to learn to solve a task (or a set of tasks), which we empirically show that improves overall stability and performance of Deep RL algorithms. Our model, named REIN-2, is a meta-learning scheme formulated within the RL framework, the goal of which is to develop a meta-RL agent (meta-learner) that learns how to produce other RL agents (inner-learners) that are capable of solving given environments. For this task, we convert the typical interaction of an RL agent with the environment into a new, single environment for the meta-learner to interact with. Compared to traditional state-of-the-art Deep RL algorithms, experimental results show remarkable performance of our model in popular OpenAI Gym environments in terms of scoring and sample efficiency, including the Mountain Car hard-exploration environment.
翻译:过去几年来,深强化学习(Deep RL)一直受到关注,这是因为其解决被传统机器学习方法视为几乎无法解决的问题的非凡能力。然而,即使是最先进的深RL算法也存在各种弱点,无法在行业应用中广泛使用,其一个重大弱点就是其抽样效率低下。为了解决这些问题,我们整合了元学习技术,以便将学习目标转向解决一项任务的目标,以学习如何解决一项任务(或一套任务),我们从经验上表明,这项任务能够改善深RL算法的总体稳定性和性能。我们的模型,即名为REIN-2,是在RL框架内开发的元学习计划,目的是开发一个元RL代理(meta-learner),以学会如何生产能够解决特定环境的其他RL代理(内里劳)。对于这项任务,我们将一个RL代理商与环境的典型互动关系,包括深度、单一的RL级的高级 RL 运算法, 用于在开放性G 的实验性能互动。