从演示中持续控制学习记忆 -- -- 依赖的连续控制 (Learning Memory-Dependent Continuous Control from Demonstrations)

Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a reasonably small number of demonstration samples. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.

翻译：有效探索在强化学习方面是一个长期的挑战,特别是在奖励很少的情况下,发展系统可以通过学习示范和自我探索来克服这一困难;然而,现有方法不适用于大多数真实世界的机器人控制问题,因为它们假定环境遵循Markov(MDP)的决定程序;因此,这些方法并不延伸到需要历史观察才能作出决定的部分可观测环境;本文件基于重播示范以记忆为依存的持续控制的想法,提出了一种新奇的算法,即经常的Actor-Critical与演示和经验再演(READER),涉及若干需要记忆的连续控制任务的实验表明,利用我们的方法与数量相当少的示范样本,大大减少了与环境的相互作用;算法还显示了比从演示中控制记忆的基线强化学习算法更好的抽样效率和学习能力。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【NeurIPS2020】DeepMind最新《神经科学人工智能》报告，126页ppt

专知会员服务

70+阅读 · 2020年12月10日