普通视频游戏双观测强化学习 (Reinforcement Learning with Dual-Observation for General Video Game Playing)

from arxiv, 12 pages.This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Reinforcement learning algorithms have performed well in playing challenging board and video games. More and more research work focus on improving the generalisation ability of reinforcement learning algorithms. The General Video Game AI Learning Competition aims at designing agents that are capable of learning to play different game levels that were unseen during training. This paper summarises the five years' General Video Game AI Learning Competition. At each edition, three new games were designed. For each game, three test levels were generated by perturbing or combining two training levels. Then, we present a novel reinforcement learning framework with dual-observation for general video game playing, under the assumption that it is more likely to observe similar local information in different levels rather than global information. Therefore, instead of directly inputting a single, raw pixel-based screenshot of current game screen, our proposed framework takes the encoded, transformed global and local observations of the game screen as two simultaneous inputs, aiming at learning local information for playing new levels. Our proposed framework is implemented with three state-of-the-art reinforcement learning algorithms and tested on the game set of the 2020 General Video Game AI Learning Competition. Ablation studies show the outstanding performance of using encoded, transformed global and local observations as input. The overall best performed agent is further used as a baseline in the 2021 competition edition.

翻译：强化学习算法在玩具有挑战性的棋盘和游戏游戏方面表现良好。越来越多的研究工作侧重于提高强化学习算法的普遍化能力。通用视频游戏 AI 学习竞赛旨在设计能够学习在训练期间看不见的不同游戏水平的代理商。本文总结了五年的通用视频游戏 AI 学习竞赛。每版都设计了三个新游戏。每场游戏都有三个测试水平,通过扰动或结合两个培训级别生成了三个测试级别。然后,我们提出了一个新的强化学习框架,为普通视频游戏提供双重观测,假设它更有可能在不同级别而不是全球信息中观测类似的本地信息。因此,我们的拟议框架不是直接输入当前游戏屏幕的单一、原始像素基截图,而是将游戏屏幕的编码、全球和地方观测作为两个同时输入。我们提议的框架通过三个州级的强化学习算法实施,并在2020年通用视频游戏 AI AI 学习竞赛的游戏套游戏中测试了三个州级的强化学习算法和测试。因此,在2021年通用视频游戏观测中,应用了一种最优秀的模型,在2021年版中进行了升级的全球测试。