Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. We prove a theoretical advantage over the traditional monolithic buffer approach and combine SSET with an existing prioritized sampling strategy to further improve learning speed and stability. Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling approaches.
翻译:经验回放(ER)是许多深层强化学习(RL)系统的重要组成部分,然而,从ER缓冲中进行统一取样可导致缓慢趋同和不稳定的无药可治行为。本文介绍从事件表(SSET)中进行分流抽样,将ER缓冲分隔到事件表,每个表都捕捉到最佳行为的重要次序列。我们证明,相对于传统的单一缓冲方法而言,SSSET在理论上具有优势,并且将SSET与现有的优先抽样战略相结合,以进一步提高学习速度和稳定性。在挑战微型Grid域、基准RL环境以及高纤维性汽车赛车模拟器中取得的经验展示了SSET相对于现有的ER缓冲取样方法的优势和多功能。