持续过渡:通过混合提高持续控制问题的抽样效率 (Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp)

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it's still challenging to apply it to real-world tasks, due to the poor sample efficiency. Attempting to overcome this shortcoming, several works focus on reusing the collected trajectory data during the training by decomposing them into a set of policy-irrelevant discrete transitions. However, their improvements are somewhat marginal since i) the amount of the transitions is usually small, and ii) the value assignment only happens in the joint states. To address these issues, this paper introduces a concise yet powerful method to construct Continuous Transition, which exploits the trajectory information by exploiting the potential transitions along the trajectory. Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions. To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically. Extensive experiments demonstrate that our proposed method achieves a significant improvement in sample efficiency on various complex continuous robotic control problems in MuJoCo and outperforms the advanced model-based / model-free RL methods. The source code is available.

翻译：尽管深入强化学习(RL)成功地应用于各种机器人控制任务,但由于抽样效率低,将它应用于现实世界的任务仍具有挑战性。为了克服这一缺陷,一些工作的重点是在培训期间重新使用收集到的轨迹数据,将这些数据分解成一系列与政策无关的离散过渡。但是,它们的改进有些微弱,因为(一) 过渡的数量通常很小,和(二) 价值分配只在联合州发生。为了解决这些问题,本文件提出了一个简明而有力的方法,用于构建持续过渡,利用轨迹信息,沿轨迹进行可能的过渡。具体地说,我们提议通过线性地将连续过渡综合新的培训过渡。为了保持既定过渡的正确性,我们还开发了一种指导施工进程的区分器。广泛的实验表明,我们所提议的方法在穆乔科各种复杂的连续机器人控制问题上取得了显著的样本效率改进,并超越了先进的模型/无型RL方法。源代码是可用的。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日

最新《自动微分手册》77页pdf

专知会员服务

103+阅读 · 2020年6月6日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日