Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al., 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. NMER preserves a locally linear approximation of the transition manifold by only applying Mixup between transitions with vicinal state-action features. Under NMER, a given transition's set of state action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via inter-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on continuous control environments. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data.
翻译:经验重现在提高深增援学习剂的抽样效率方面发挥着关键作用。最近经验重现方面的进展提议使用混集(Zhang等人,2018年)通过合成样本生成进一步提高样本效率。我们利用邻里混合经验重现(NMER)这一技术,以几何为根据的重现缓冲,在州-行动空间与最邻近的邻国进行交接。NMER仅通过在带有维辛那州-行动特征的过渡之间应用混集来保持当地过渡过程的线性近似。在NMER下,一个特定过渡阶段的邻邦行动组合是动态的,并且是随机的,这反过来又鼓励通过跨周期互换来提高政策的普遍性。我们将我们的方法与最近的非政策深度强化学习算法结合起来,并对连续控制环境进行评估。我们观察到NMER对基准重现缓冲平均提高94%(TD3)和29%(SAC)的样本效率,使代理商能够有效地恢复以往的经验并从有限的数据中学习。