常规视频游戏的强力强化学习 (Robust Reinforcement Learning for General Video Game Playing)

from arxiv, 10 pages, 4 figures.This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Reinforcement learning has successfully learned to play challenging board and video games. However, its generalization ability remains under-explored. The General Video Game AI Learning Competition aims at designing agents that are capable of learning to play different games levels that were unseen during training. This paper presents the games, entries and results of the 2020 General Video Game AI Learning Competition, held at the Sixteenth International Conference on Parallel Problem Solving from Nature and the 2020 IEEE Conference on Games. Three new games with sparse, periodic and dense rewards, respectively, were designed for this competition and the test levels were generated by adding minor perturbations to training levels or combining training levels. In this paper, we also design a reinforcement learning agent, called Arcane, for general video game playing. We assume that it is more likely to observe similar local information in different levels rather than global information. Therefore, instead of directly inputting a single, raw pixel-based screenshot of current game screen, Arcane takes the encoded, transformed global and local observations of the game screen as two simultaneous inputs, aiming at learning local information for playing new levels. Two versions of Arcane, using a stochastic or deterministic policy for decision-making during test, both show robust performance on the game set of the 2020 General Video Game AI Learning Competition.

翻译：强化学习成功地学会了玩具有挑战性的棋盘和视频游戏。然而,它的普及能力仍未得到充分探索。通用视频游戏 AI 学习竞赛旨在设计能够学会玩培训期间看不见的不同水平的游戏的代理商。本文介绍了在第十六次解决自然平行问题国际大会和2020年IEEE运动大会上举行的2020年通用视频游戏AI学习竞赛的游戏、条目和结果。为这一竞争设计了三种新游戏,其奖赏分别稀少、定期和密集,测试水平是通过在培训级别上增加轻微扰动或合并培训级别产生的。在本文中,我们还设计了一种强化学习剂,称为Arcane,用于一般视频游戏游戏游戏游戏。我们假设它更有可能在不同级别而不是全球信息中观测类似的本地信息。因此,Arcane没有直接输入当前游戏屏幕的单一、原始的像素样截图,而是将游戏屏幕的编码、改变的全球和本地观测结果作为两个同步输入,目的是学习本地信息以玩新水平。两种版本的Arcane,即Arcane,在2020年的游戏游戏常规测试中,用一个坚固的游戏模拟动作显示常规决策。