Recent advances in Deep Reinforcement Learning (DRL) have largely focused on improving the performance of agents with the aim of replacing humans in known and well-defined environments. The use of these techniques as a game design tool for video game production, where the aim is instead to create Non-Player Character (NPC) behaviors, has received relatively little attention until recently. Turn-based strategy games like Roguelikes, for example, present unique challenges to DRL. In particular, the categorical nature of their complex game state, composed of many entities with different attributes, requires agents able to learn how to compare and prioritize these entities. Moreover, this complexity often leads to agents that overfit to states seen during training and that are unable to generalize in the face of design changes made during development. In this paper we propose two network architectures which, when combined with a \emph{procedural loot generation} system, are able to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions. The first is based on a dense embedding of the categorical input space that abstracts the discrete observation model and renders trained agents more able to generalize. The second proposed architecture is more general and is based on a Transformer network able to reason relationally about input and input attributes. Our experimental evaluation demonstrates that new agents have better adaptation capacity with respect to a baseline architecture, making this framework more robust to dynamic gameplay changes during development. Based on the results shown in this paper, we believe that these solutions represent a step forward towards making DRL more accessible to the gaming industry.
 翻译:深强化学习(DRL)最近的进展主要侧重于改善代理商的性能,目的是在已知和定义明确的环境下取代人类。使用这些技术作为游戏设计工具来制作视频游戏,目的是创造非玩玩字符(NPC)的行为,但直到最近才得到相对较少的注意。例如Roguelishs(Ruguelishs)等基于转动的战略游戏对DRL提出了独特的挑战。特别是,由许多具有不同属性的实体组成的复杂游戏状态的绝对性质要求代理商学会如何比较这些实体并对其进行优先排序。此外,这种复杂性往往导致这些代理商在培训期间过度适应国家,无法在设计变化面前进行概括化。在本文中,我们提出了两个网络架构,这些架构与一个复杂的绝对状态空间相结合,并减轻了设计决定所迫使的再培训需要。首先基于一个直截面输入空间的密集嵌入式嵌入,可以将离离式观测模型进行缩略,并使经过培训的代理商在设计过程中能够更准确地进行变革。在总体投入过程中,提出一种更动态的网络结构能够显示一种更动态的升级的模型。提议,这是一种更精确的模型,在总体的模型上更能展示。