Experience replay-based sampling techniques are essential to several reinforcement learning (RL) algorithms since they aid in convergence by breaking spurious correlations. The most popular techniques, such as uniform experience replay (UER) and prioritized experience replay (PER), seem to suffer from sub-optimal convergence and significant bias error, respectively. To alleviate this, we introduce a new experience replay method for reinforcement learning, called Introspective Experience Replay (IER). IER picks batches corresponding to data points consecutively before the 'surprising' points. Our proposed approach is based on the theoretically rigorous reverse experience replay (RER), which can be shown to remove bias in the linear approximation setting but can be sub-optimal with neural approximation. We show empirically that IER is stable with neural function approximation and has a superior performance compared to the state-of-the-art techniques like uniform experience replay (UER), prioritized experience replay (PER), and hindsight experience replay (HER) on the majority of tasks.
翻译:基于经验重现的抽样技术对若干强化学习算法至关重要,因为它们通过打破假的关联而有助于趋同。最受欢迎的技术,例如统一经验重现(UER)和优先经验重现(PER),似乎分别受到亚最佳趋同和重大偏差错误的影响。为此,我们引入了一种新的强化学习经验重现方法,称为内向体验重现(IER)。IER选择了与“突变”点之前连续数据点相对应的分批。我们提议的方法基于理论上严格的反向重现(RER),这可以显示消除线性近距离设置的偏差,但可以是神经近距离的次优劣。我们从经验上表明,IER与神经功能近似稳定,并且与统一的经验重现(UER)、优先经验重现(PER)和在大多数任务上的短视重现(HER)相比,其性能更高。