Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.
翻译:强化学习( RL) 使机器人能够从与真实世界的互动中学习技能。 在实践上, Deep RL 中使用的无结构的分步勘探 -- -- 通常在模拟中非常成功 -- -- 导致真实机器人的干燥运动模式。 由此造成的摇晃行为的后果是探索不力,甚至对机器人的损害。 我们通过调整国家独立勘探(SDE)以适应当前的深 RL 算法来解决这些问题。 为了进行这一调整,我们提议将最初的 SDE 扩展为两个扩展, 使用更通用的特性并定期复制噪音, 从而导致一种新的探索方法, 普遍依赖国家进行( gSDE ) 。 我们在模拟中、 在 PyBullet 连续控制任务上以及直接在三种不同的真正机器人上评价 gSDE : 受控弹性机器人、 四重和 RC 汽车。 gSDE 允许的噪音取样间隔在性能和平稳之间达成妥协, 从而可以直接对真正的机器人进行培训而不会丧失性能。 我们的代码可在 https://github.com/ DLRM- RM/stablelineline- base3 。