未受监督的加强学习元学习 (Unsupervised Meta-Learning for Reinforcement Learning)

Meta-learning is a powerful tool that builds on multi-task learning to learn how to quickly adapt a model to new tasks. In the context of reinforcement learning, meta-learning algorithms can acquire reinforcement learning procedures to solve new problems more efficiently by meta-learning prior tasks. The performance of meta-learning algorithms critically depends on the tasks available for meta-training: in the same way that supervised learning algorithms generalize best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of unsupervised meta-learning algorithms for reinforcement learning. We describe a general recipe for unsupervised meta-reinforcement learning, and describe an effective instantiation of this approach based on a recently proposed unsupervised exploration technique and model-agnostic meta-learning. We also discuss practical and conceptual considerations for developing unsupervised meta-learning methods. Our experimental results demonstrate that unsupervised meta-reinforcement learning effectively acquires accelerated reinforcement learning procedures without the need for manual task design, significantly exceeds the performance of learning from scratch, and even matches performance of meta-learning methods that use hand-specified task distributions.

翻译：元学习是一种强有力的工具,它建立在多任务学习的基础上,学会如何迅速将模型适应新的任务。在强化学习方面,元学习算法可以获取强化学习程序,通过前项任务进行元学习,从而更有效地解决新的问题。元学习算法的性能关键地取决于元培训的任务:监督学习算法的性能与测试从与培训点相同的分布点的最佳方法相同,元学习方法将最佳方法概括到与培训任务相同的分配任务。实际上,元加强学习将设计的负担从算法到任务设计。如果我们能够将任务设计过程自动化,我们就可以设计一个真正自动化的元学习算法。在这项工作中,我们迈出了一步,提出了一套未经监督的元学习手工学习的人工算法,我们描述了一种不受监督的元加强学习学习的通俗作,并描述了基于最近提出的未经监督的计算方法,将设计过程从算法转换为任务分配。我们还可以设计一个真正自动化的元化的计算法,从不进行实际的学习,而要通过模型化的计算方法来学习。