深强化学习跨越狭窄差距 (Passing Through Narrow Gaps with Deep Reinforcement Learning)

The DARPA subterranean challenge requires teams of robots to traverse difficult and diverse underground environments. Traversing small gaps is one of the challenging scenarios that robots encounter. Imperfect sensor information makes it difficult for classical navigation methods, where behaviours require significant manual fine tuning. In this paper we present a deep reinforcement learning method for autonomously navigating through small gaps, where contact between the robot and the gap may be required. We first learn a gap behaviour policy to get through small gaps (only centimeters wider than the robot). We then learn a goal-conditioned behaviour selection policy that determines when to activate the gap behaviour policy. We train our policies in simulation and demonstrate their effectiveness with a large tracked robot in simulation and on the real platform. In simulation experiments, our approach achieves 93% success rate when the gap behaviour is activated manually by an operator, and 67% with autonomous activation using the behaviour selection policy. In real robot experiments, our approach achieves a success rate of 73% with manual activation, and 40% with autonomous behaviour selection. While we show the feasibility of our approach in simulation, the difference in performance between simulated and real world scenarios highlight the difficulty of direct sim-to-real transfer for deep reinforcement learning policies. In both the simulated and real world environments alternative methods were unable to traverse the gap.

翻译：DARPA 地下挑战要求机器人团队跨过困难和多样化的地下环境。缩小小差距是机器人遇到的挑战情景之一。不完善的传感器信息使得古典导航方法难以操作, 行为需要大量的人工微调。在本文中, 我们展示了一种深入强化的学习方法, 通过小差距进行自主导航, 机器人和差距之间可能需要接触。我们首先学习了一种差距行为政策, 以克服小差距( 仅大于机器人的厘米)。然后我们学习了一种有目标条件的行为选择政策, 确定何时启动差距行为政策。我们在模拟中和在实际平台上用一个大型跟踪机器人来培训我们的政策, 并展示它们的有效性。在模拟实验中, 当操作者手动了差距行为, 我们的方法达到了93%的成功率, 使用行为选择政策可以自动启动。在真正的机器人实验中, 我们的方法取得了73%的成功率, 手动激活, 自主行为选择了40 % 。我们展示了模拟方法的可行性, 模拟时, 模拟和真实的模拟和真实的强化政策之间业绩差异。模拟和真实的模拟环境的学习方法难以直接转换。

相关内容

深度强化学习

关注 154

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

专知会员服务

27+阅读 · 2020年4月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日