Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. We prove a theoretical advantage over the traditional monolithic buffer approach and combine SSET with an existing prioritized sampling strategy to further improve learning speed and stability. Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling approaches.
翻译:经验回放(ER)是许多深度强化学习(RL)系统的关键组成部分。但是,从ER缓冲区中进行均匀采样可能导致收敛速度慢,渐近不稳定的行为。本文介绍了事件表格的层级采样(SSET),它将ER缓冲区分成事件表格式,每个表格都捕获了最优行为的重要子序列。我们证明了与传统的单体缓冲区方法相比的理论优势,并将SSET与现有的优先采样策略相结合,进一步提高了学习的速度和稳定性。在具有挑战性的MiniGrid域,基准RL环境和高保真度的汽车赛车模拟器中的实验结果,展示了SSET在现有ER缓冲器采样方法上的优势和多功能性。