Across the Arcade Learning Environment, Rainbow achieves a level of performance competitive with humans and modern RL algorithms. However, attaining this level of performance requires large amounts of data and hardware resources, making research in this area computationally expensive and use in practical applications often infeasible. This paper's contribution is threefold: We (1) propose an improved version of Rainbow, seeking to drastically reduce Rainbow's data, training time, and compute requirements while maintaining its competitive performance; (2) we empirically demonstrate the effectiveness of our approach through experiments on the Arcade Learning Environment, and (3) we conduct a number of ablation studies to investigate the effect of the individual proposed modifications. Our improved version of Rainbow reaches a median human normalized score close to classic Rainbow's, while using 20 times less data and requiring only 7.5 hours of training time on a single GPU. We also provide our full implementation including pre-trained models.
翻译:在整个弧形学习环境中,彩虹取得了与人类和现代RL算法具有竞争力的业绩水平;然而,达到这一业绩水平需要大量的数据和硬件资源,使得这一领域的研究在计算上费用昂贵,而且往往无法实际应用,本文件的贡献有三重:(1) 提出改进版彩虹,力求大幅减少彩虹的数据、培训时间和计算要求,同时保持其竞争性业绩;(2) 我们通过在弧形学习环境中的实验,实证地展示了我们做法的有效性;(3) 我们进行了一系列的调整研究,以调查个别修改的影响。我们改进版彩虹达到接近经典彩虹的中位人平分,同时使用的数据减少了20倍,在单一的GPU上只需要7.5小时的培训时间。我们还提供了我们的全面实施,包括预先培训的模型。