Representing games through their pixels offers a promising approach for building general-purpose and versatile game models. While games are not merely images, neural network models trained on game pixels often capture differences of the visual style of the image rather than the content of the game. As a result, such models cannot generalize well even within similar games of the same genre. In this paper we build on recent advances in contrastive learning and showcase its benefits for representation learning in games. Learning to contrast images of games not only classifies games in a more efficient manner; it also yields models that separate games in a more meaningful fashion by ignoring the visual style and focusing, instead, on their content. Our results in a large dataset of sports video games containing 100k images across 175 games and 10 game genres suggest that contrastive learning is better suited for learning generalized game representations compared to conventional supervised learning. The findings of this study bring us closer to universal visual encoders for games that can be reused across previously unseen games without requiring retraining or fine-tuning.
翻译:通过像素代表游戏,为建立通用和多功能游戏模型提供了一个很有希望的方法。虽然游戏不仅仅是图像,但以像素游戏训练的神经网络模型往往捕捉到图像视觉风格的差异,而不是游戏内容。因此,即使在同一类类型的类似游戏中,这些模型也无法广泛推广。在本文中,我们借鉴了对比式学习的最新进展,并展示了其在游戏中代表式学习的好处。学习了对比游戏图像,不仅以更有效率的方式对游戏进行分类;还生成了以更有意义的方式将游戏分开的模型,即无视视觉风格,而是侧重于其内容。我们在包含175场游戏和10种游戏类型100k图像的大型体育视频游戏数据集中得出的结果表明,对比式学习通用游戏图象比传统的受监督学习更适合。这项研究的结果使我们更接近于通用的视觉解码器,这些游戏可以在以往的看不见的游戏中重新利用,而无需再培训或微调。