The Arcade Learning Environment (ALE) has become an essential benchmark for assessing the performance of reinforcement learning algorithms. However, the computational cost of generating results on the entire 57-game dataset limits ALE's use and makes the reproducibility of many results infeasible. We propose a novel solution to this problem in the form of a principled methodology for selecting small but representative subsets of environments within a benchmark suite. We applied our method to identify a subset of five ALE games, called Atari-5, which produces 57-game median score estimates within 10% of their true values. Extending the subset to 10-games recovers 80% of the variance for log-scores for all games within the 57-game set. We show this level of compression is possible due to a high degree of correlation between many of the games in ALE.
翻译:街机学习环境(ALE)已成为评估强化学习算法绩效的基本基准。然而,整个57游戏数据集产生结果的计算成本限制了ALE的使用,使得许多结果无法再复制。我们提议以原则方法方式解决这个问题,在基准套件中选择小型但有代表性的环境子集。我们采用的方法确定了五个ACE游戏的子集,称为Atari-5,在实际值的10%内得出57游戏中位得分估计值。将子集扩大到10游戏,收回了57游戏中所有游戏中80%的日志数差异。我们表明,这种压缩程度是可能的,因为ALE中的许多游戏之间有着高度的相关性。