掌握有限数据的阿塔里运动会 (Mastering Atari Games with Limited Data)

Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 194.3% mean human performance and 109.0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark. This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero's performance is also close to DQN's performance at 200 million frames while we consume 500 times less data. EfficientZero's low sample complexity and high performance can bring RL closer to real-world applicability. We implement our algorithm in an easy-to-understand manner and it is available at https://github.com/YeWR/EfficientZero. We hope it will accelerate the research of MCTS-based RL algorithms in the wider community.

翻译：在许多应用领域,强化学习取得了巨大成功。然而,抽样效率仍然是一个关键的挑战,在Atari 100k基准上,只有两小时的实时游戏经验,在DMCOR 100k基准的某些任务中比SAC高超。这是首次在Atari游戏基准上实现超人性能。这是在Atari游戏基准上取得一致的人类性能,但仍然是一个难以实现的目标。我们提议在MuZero上建立高效Zero的基于模型性能的视觉性能RL算法,我们称之为“节能Zero”。我们的方法达到了194.3 % 中等人类性能,在Atari 100k基准上达到109.0%的中位性能,只有两个小时的实时游戏经验,在DMControl100k基准的某些任务中超过了SAC。这是首次在Atari游戏上以如此微小的数据实现超人性性能。高效Zero的性能也接近DQN的2亿个框架,而我们却消耗了500倍的数据。高效Zero的样本复杂性和高性能能能使RL更接近于现实应用性能。我们以简单的方式执行我们的算法,我们以易于理解的方式执行了Squs/RWIW/HIxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx。