Prioritized experience replay (PER) samples important transitions, rather than uniformly, to improve the performance of a deep reinforcement learning agent. We claim that such prioritization has to be balanced with sample diversity for making the DQN stabilized and preventing forgetting. Our proposed improvement over PER, called Predictive PER (PPER), takes three countermeasures (TDInit, TDClip, TDPred) to (i) eliminate priority outliers and explosions and (ii) improve the sample diversity and distributions, weighted by priorities, both leading to stabilizing the DQN. The most notable among the three is the introduction of the second DNN called TDPred to generalize the in-distribution priorities. Ablation study and full experiments with Atari games show that each countermeasure by its own way and PPER contribute to successfully enhancing stability and thus performance over PER.
翻译:我们声称,这种优先排序必须与样本多样性相平衡,以便使DQN稳定下来,并防止忘记。我们提议的对PER的改进(PPER,称为PER,PPER)采取三种对策(TDInit,TDClip,TDPred),以便(一)消除优先切线和爆炸,(二)改善抽样多样性和分布,同时按优先次序加以权衡,以便稳定DQN。这三种办法中最显著的是采用第二个DNN,称为TDPred,以概括分配中的优先事项。Atari游戏的减缩研究和全面实验表明,每种反措施都以其本身的方式和PPER都有助于成功地增强稳定,从而超越PER。