Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. We show that DeRL is more robust to scaling and speed of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically motivated baselines in fewer interactions.
翻译:内在奖励通常用于改善强化学习中的探索,但是,这些办法由于非固定的奖励制和对超参数的强烈依赖而不稳定。在这项工作中,我们提议Decoupled RL(DeRL)(DeRL)(DeRL)(DeRL)(Decoupled RL))(Deuropled RL)(DeRL)(Derople RL)(DeRL)(Deroppled RL)(DeRL)(DeRL)(Decuple RL)(DeRL)(Decoluple RL)(DeRL)(DeRL)(DeRL)(DeRL)(Deurupled RL)(Development)(Development RL(Development)(Development RL)(Developmental)(Developmental)(Descrial)(Descrial)(Descrial)(Descrial)(Devenal)(Devenal)(Devenal)(Devenal)(deal)(deal)(deal)(Deventional)(deal)(deal)(deal)(deal)(deal)(deal)(deal)(Descricuments)(deal)(de)(de)(de)(deal)(dectiontal)(deal)(dection)(de)(de)(de)(dection)(de)(dectional)(de)(dection)(de)(dection)(dection)(de))(de)(de)(dection)(dection)(dections))))(de)(D)(de)(de)(de)(de)(de)(de)(de)(dection)(dection)(de)(dection)(de)(de))(de)(de))(de)(de)(de)))(de)(de)(de)(D)(D)(D)(D)(D)(D)(D)(D)(de