Visual navigation by mobile robots is classically tackled through SLAM plus optimal planning, and more recently through end-to-end training of policies implemented as deep networks. While the former are often limited to waypoint planning, but have proven their efficiency even on real physical environments, the latter solutions are most frequently employed in simulation, but have been shown to be able learn more complex visual reasoning, involving complex semantical regularities. Navigation by real robots in physical environments is still an open problem. End-to-end training approaches have been thoroughly tested in simulation only, with experiments involving real robots being restricted to rare performance evaluations in simplified laboratory conditions. In this work we present an in-depth study of the performance and reasoning capacities of real physical agents, trained in simulation and deployed to two different physical environments. Beyond benchmarking, we provide insights into the generalization capabilities of different agents training in different conditions. We visualize sensor usage and the importance of the different types of signals. We show, that for the PointGoal task, an agent pre-trained on wide variety of tasks and fine-tuned on a simulated version of the target environment can reach competitive performance without modelling any sim2real transfer, i.e. by deploying the trained agent directly from simulation to a real physical robot.
翻译:移动机器人的视觉导航是典型的,通过SLAM+最佳规划以及最近通过对深层网络所实施的政策进行端到端培训来处理,前者通常限于路针规划,但证明即使在实际物理环境中也是有效的,后者是最经常在模拟中使用的,但证明能够学习更复杂的视觉推理,涉及复杂的语义规律。在物理环境中真正的机器人的导航仍然是一个尚未解决的问题。在模拟中,仅对端到端的培训方法进行了彻底测试,涉及实际机器人的实验仅限于在简化实验室条件下的罕见性能评估。在这项工作中,我们深入研究了真实物理剂的性能和推理能力,对其进行了模拟培训并部署到两个不同的物理环境。除了基准外,我们还对不同物剂培训的通用能力进行了深入了解。我们设想传感器的使用和不同类型信号的重要性。我们显示,对于Point目标任务而言,一个代理商在广泛的任务上预先接受了培训,并且对目标环境的模拟版本进行了微调。通过直接模拟,可以达到具有竞争性的性能,而不用模拟任何经过训练的物理机器人。