The DARPA subterranean challenge requires teams of robots to traverse difficult and diverse underground environments. Traversing small gaps is one of the challenging scenarios that robots encounter. Imperfect sensor information makes it difficult for classical navigation methods, where behaviours require significant manual fine tuning. In this paper we present a deep reinforcement learning method for autonomously navigating through small gaps, where contact between the robot and the gap may be required. We first learn a gap behaviour policy to get through small gaps (only centimeters wider than the robot). We then learn a goal-conditioned behaviour selection policy that determines when to activate the gap behaviour policy. We train our policies in simulation and demonstrate their effectiveness with a large tracked robot in simulation and on the real platform. In simulation experiments, our approach achieves 93% success rate when the gap behaviour is activated manually by an operator, and 67% with autonomous activation using the behaviour selection policy. In real robot experiments, our approach achieves a success rate of 73% with manual activation, and 40% with autonomous behaviour selection. While we show the feasibility of our approach in simulation, the difference in performance between simulated and real world scenarios highlight the difficulty of direct sim-to-real transfer for deep reinforcement learning policies. In both the simulated and real world environments alternative methods were unable to traverse the gap.
翻译:DARPA 地下挑战要求机器人团队跨过困难和多样化的地下环境。 缩小小差距是机器人遇到的挑战情景之一。 不完善的传感器信息使得古典导航方法难以操作, 行为需要大量的人工微调。 在本文中, 我们展示了一种深入强化的学习方法, 通过小差距进行自主导航, 机器人和差距之间可能需要接触。 我们首先学习了一种差距行为政策, 以克服小差距( 仅大于机器人的厘米)。 然后我们学习了一种有目标条件的行为选择政策, 确定何时启动差距行为政策。 我们在模拟中和在实际平台上用一个大型跟踪机器人来培训我们的政策, 并展示它们的有效性。 在模拟实验中, 当操作者手动了差距行为, 我们的方法达到了93%的成功率, 使用行为选择政策可以自动启动。 在真正的机器人实验中, 我们的方法取得了73%的成功率, 手动激活, 自主行为选择了40 % 。 我们展示了模拟方法的可行性, 模拟时, 模拟和真实的模拟和真实的强化政策之间业绩差异。 模拟和真实的模拟环境的学习方法难以直接转换。