It has recently been shown that reinforcement learning can be used to train generators capable of producing high-quality game levels, with quality defined in terms of some user-specified heuristic. To ensure that these generators' output is sufficiently diverse (that is, not amounting to the reproduction of a single optimal level configuration), the generation process is constrained such that the initial seed results in some variance in the generator's output. However, this results in a loss of control over the generated content for the human user. We propose to train generators capable of producing controllably diverse output, by making them "goal-aware." To this end, we add conditional inputs representing how close a generator is to some heuristic, and also modify the reward mechanism to incorporate that value. Testing on multiple domains, we show that the resulting level generators are capable of exploring the space of possible levels in a targeted, controllable manner, producing levels of comparable quality as their goal-unaware counterparts, that are diverse along designer-specified dimensions.
翻译:最近已经表明,强化学习可用于培训能够产生高质量游戏水平的发电机,其质量以某些用户指定的杂质为定义。为了确保这些发电机的输出具有足够的多样性(即,不等于复制一个最佳配置),发电过程受到限制,以致最初的种子导致发电机输出的某些差异。然而,这导致对产生的人类用户内容失去控制。我们提议通过“目标意识”来培训能够产生可控不同输出的发电机。为此,我们增加有条件的投入,表示发电机与某些超链接的距离,并修改奖励机制以纳入这一价值。在多个领域测试,我们显示,所产生的等级发电机能够有针对性地、可控制地探索可能达到的水平的空间,产生与其目标软件对应方相当的质量水平,这些质量水平与设计师指定的不同。