Epsodic 与后见自学 (Episodic Self-Imitation Learning with Hindsight)

Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state-action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transitions-based method which performs poorly in handling continuous control environments with sparse rewards. From the experiments, episodic self-imitation learning is shown to perform better than baseline on-policy algorithms, achieving comparable performance to state-of-the-art off-policy algorithms in several simulated robot control tasks. The trajectory selection module is shown to prevent the agent learning undesirable hindsight experiences. With the capability of solving sparse reward problems in continuous control settings, episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces, such as robot guidance and manipulation.

翻译：Episod 自我缩进学习,是一种带有轨迹选择模块和适应性损失功能的新型自我缩进算法,旨在加速强化学习。与最初的自我缩进学习算法相比,我们的代理商用事后观察来利用整个过程来帮助自我缩进学习。引入了一个选择模块来过滤每个更新插件的非信息样本。拟议方法克服了标准自我缩进学习算法的局限性,这种过渡性方法在以稀有的奖励处理连续控制环境方面表现不佳。从实验中可以看出,缩进式自我缩进学习比基线政策算法表现得更好,在一些模拟机器人控制任务中取得了与最新非政策算法的相似的性能。轨迹选择模块可以防止代理商学习不可取的自我缩进体验。在连续控制环境中解决稀有的奖赏问题的能力,缩进式自我缩学习具有潜力,可以应用到具有连续操作空间、像机器人这样的机器人操纵等实际世界问题。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日