回首惊讶时的回顾:稳定神经近似的反向经验回放 (Look Back When Surprised: Stabilizing Reverse Experience Replay for Neural Approximation)

Experience replay-based sampling techniques are essential to several reinforcement learning (RL) algorithms since they aid in convergence by breaking spurious correlations. The most popular techniques, such as uniform experience replay(UER) and prioritized experience replay (PER), seem to suffer from sub-optimal convergence and significant bias error, respectively. To alleviate this, we introduce a new experience replay method for reinforcement learning, called IntrospectiveExperience Replay (IER). IER picks batches corresponding to data points consecutively before the 'surprising' points. Our proposed approach is based on the theoretically rigorous reverse experience replay (RER), which can be shown to remove bias in the linear approximation setting but can be sub-optimal with neural approximation. We show empirically that IER is stable with neural function approximation and has a superior performance compared to the state-of-the-art techniques like uniform experience replay (UER), prioritized experience replay(PER), and hindsight experience replay (HER) on the majority of tasks.

翻译：基于经验重现的抽样技术对若干强化学习算法至关重要,因为它们通过打破假的关联而有助于趋同。最受欢迎的技术,如统一的经验重现和优先的经验重现(PER),似乎分别受到亚最佳趋同和重大偏差错误的影响。为了减轻这一影响,我们引入了一种新的强化学习经验重现方法,称为内向体验重现(IER)。IER选择了与“震动”点之前连续数据点相对应的分批。我们提议的方法基于理论上严格的反向重现(RER),这可以显示消除线性近环境中的偏差,但可以是神经近似的次最佳重现。我们从经验学上表明,IER与神经功能近似稳定,其性能优于最先进的技术,如统一的经验重现(UER)、优先经验重现(PER)和在大部分任务上的初步重现(HER)。