In this paper, we revisit endless online level generation with the recently proposed experience-driven procedural content generation via reinforcement learning (EDRL) framework. Inspired by an observation that EDRL tends to generate recurrent patterns, we formulate a notion of state space closure which makes any stochastic state appeared possibly in an infinite-horizon online generation process can be found within a finite-horizon. Through theoretical analysis, we find that even though state space closure arises a concern about diversity, it generalises EDRL trained with a finite-horizon to the infinite-horizon scenario without deterioration of content quality. Moreover, we verify the quality and the diversity of contents generated by EDRL via empirical studies, on the widely used Super Mario Bros. benchmark. Experimental results reveal that the diversity of levels generated by EDRL is limited due to the state space closure, whereas their quality does not deteriorate in a horizon which is longer than the one specified in the training. Concluding our outcomes and analysis, future work on endless online level generation via reinforcement learning should address the issue of diversity while assuring the occurrence of state space closure and quality.
翻译:在本文中,我们通过最近提出的基于经验驱动的强化学习程序内容生成(EDRL)框架重新审视无尽在线关卡生成问题。受到一种观察的启发,EDRL往往会生成重复的模式,因此我们在文章中提出了一种名为状态空间闭合的概念,在这种状态下,任何可能出现在无限时间线上的随机状态都可以被发现在有限时间内。通过理论分析,我们发现,即使状态空间闭合引起了对多样性的关注,EDRL仍然可以在不损失内容质量的情况下将其训练于有限时间线并推广到无限时间线的情况。此外,我们通过实证研究验证了EDRL生成的内容的质量和多样性,测试用例为广泛使用的Super Mario Bros.基准测试。实验结果显示,由于状态空间闭合,EDRL生成的关卡多样性受到了限制,而其质量不会在比训练规定的时间线更长的时间线上出现明显的退化。通过我们的结果和分析,未来有关通过强化学习进行无尽在线关卡生成的工作应该解决多样性问题,并确保状态空间闭合和内容质量的出现。