程序内容生成的反向强化学习 (Adversarial Reinforcement Learning for Procedural Content Generation)

We present a new approach ARLPCG: Adversarial Reinforcement Learning for Procedural Content Generation, which procedurally generates and tests previously unseen environments with an auxiliary input as a control variable. Training RL agents over novel environments is a notoriously difficult task. One popular approach is to procedurally generate different environments to increase the generalizability of the trained agents. ARLPCG instead deploys an adversarial model with one PCG RL agent (called Generator) and one solving RL agent (called Solver). The Generator receives a reward signal based on the Solver's performance, which encourages the environment design to be challenging but not impossible. To further drive diversity and control of the environment generation, we propose using auxiliary inputs for the Generator. The benefit is two-fold: Firstly, the Solver achieves better generalization through the Generator's generated challenges. Secondly, the trained Generator can be used as a creator of novel environments that, together with the Solver, can be shown to be solvable. We create two types of 3D environments to validate our model, representing two popular game genres: a third-person platformer and a racing game. In these cases, we shows that ARLPCG has a significantly better solve ratio, and that the auxiliary inputs renders the levels creation controllable to a certain degree. For a video compilation of the results please visit https://youtu.be/z7q2PtVsT0I.

翻译：我们提出了一个新方法ARLPCG:程序内容生成的反向强化学习:程序内容生成的反向强化学习,该方法在程序上生成并测试先前的不为人知的环境,作为控制变量的辅助投入。在新环境中培训RL代理是一个臭名昭著的困难任务。一种流行的做法是在程序上创造不同的环境,以提高受过训练的代理的通用性。ARLPCG代而采用一个具有PCG RL代理(称为发电机)和一个解决RL代理(称为Solver)的对抗模式。发电机收到一个基于溶剂性能的奖赏信号,它鼓励环境设计具有挑战性但并非不可能。为了进一步推动环境生成的多样化和控制,我们建议使用辅助性投入来生成发电机。其好处是双重的:首先,溶剂通过发电机产生的挑战实现更好的概括性化。第二,经过训练的发电机可以用来创造新环境,与溶剂一起展示可溶解剂。我们创建了两种类型的3D环境来验证我们的模型,代表两种流行的游戏类型:第三个人造平台和快速的游戏。我们展示了某种程度的ARPC的游戏。