In this paper we revisit endless online level generation with the recently proposed experience-driven procedural content generation via reinforcement learning (EDRL) framework, from an observation that EDRL tends to generate recurrent patterns. Inspired by this phenomenon, we formulate a notion of state space closure, which means that any state that may appear in an infinite-horizon online generation process can be found in a finite horizon. Through theoretical analysis we find that though state space closure arises a concern about diversity, it makes the EDRL trained on a finite-horizon generalised to the infinite-horizon scenario without deterioration of content quality. Moreover, we verify the quality and diversity of contents generated by EDRL via empirical studies on the widely used Super Mario Bros. benchmark. Experimental results reveal that the current EDRL approach's ability of generating diverse game levels is limited due to the state space closure, whereas it does not suffer from reward deterioration given a horizon longer than the one of training. Concluding our findings and analysis, we argue that future works in generating online diverse and high-quality contents via EDRL should address the issue of diversity on the premise of state space closure which ensures the quality.
翻译:在本文中,我们重新审视了无穷无尽的在线水平生成,最近建议通过强化学习(EDRL)框架,以经验驱动的程序内容生成方式(EDRL)框架(EDRL往往产生经常性模式),从中可以看出,EDRL往往产生经常性模式。受这一现象的启发,我们制定了国家空间封闭的概念,这意味着在无限一等在线生成过程中可能出现的任何状态都可以在有限的范围内找到。通过理论分析,我们发现虽然国家空间封闭引起了对多样性的关注,但通过这种方式,EDRL培训了有限一等分级,在内容质量不下降的情况下,将无限一等分级的情景概括为范围。此外,我们通过对广泛使用的超级Mario Bros(Super Mario Bros)基准的经验研究,来核实EDRL生成的内容的质量和多样性。实验结果显示,目前EDRL方法生成不同游戏水平的能力因国家空间关闭而受到限制,但不会因为国家空间关闭而导致报酬的恶化,因为一个视野比培训长。我们的结论和分析结论和分析认为,未来通过EDRL生成网上多样化和高质量内容的工作应该解决确保质量的多样化问题。