Generalization in reinforcement learning (RL) is of importance for real deployment of RL algorithms. Various schemes are proposed to address the generalization issues, including transfer learning, multi-task learning and meta learning, as well as the robust and adversarial reinforcement learning. However, there is not a unified formulation of the various schemes, as well as the comprehensive comparisons of methods across different schemes. In this work, we propose a game-theoretic framework for the generalization in reinforcement learning, named GiRL, where an RL agent is trained against an adversary over a set of tasks, where the adversary can manipulate the distributions over tasks within a given threshold. With different configurations, GiRL can reduce the various schemes mentioned above. To solve GiRL, we adapt the widely-used method in game theory, policy space response oracle (PSRO) with the following three important modifications: i) we use model-agnostic meta learning (MAML) as the best-response oracle, ii) we propose a modified projected replicated dynamics, i.e., R-PRD, which ensures the computed meta-strategy of the adversary fall in the threshold, and iii) we also propose a protocol for the few-shot learning of the multiple strategies during testing. Extensive experiments on MuJoCo environments demonstrate that our proposed methods can outperform existing baselines, e.g., MAML.
翻译:强化学习的普及(RL)对于真正部署RL算法十分重要。 提出了各种计划以解决一般化问题,包括转让学习、多任务学习和元学习,以及强有力和对抗性强化学习。然而,没有统一制定各种计划,以及全面比较不同计划的方法。在这项工作中,我们提议了一个名为GiRL的游戏理论框架,用于普及强化学习的普及,称为GiRL的游戏理论框架,其中一名RL代理被训练对抗一系列任务,其中对手可以操纵某一门槛内任务的分配。在不同的配置下,GiRL可以减少上述各种计划。为了解决GIRL,我们调整了游戏理论、政策空间反应或触角(PSRO)中广泛使用的方法,并进行了以下三个重要修改:一) 我们使用模型-认知性元学习(MAML)作为最佳反应或奇迹,二) 我们提议修改预测的复制动态,即:R-PRD,它可以确保将分配分配到某一阈值范围内的任务。