Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end learning approaches reactively map sensor inputs to actions with deep neural networks, while modular learning approaches enrich the classical pipeline with learning-based semantic sensing and exploration. But learned visual navigation policies have predominantly been evaluated in simulation. How well do different classes of methods work on a robot? We present a large-scale empirical study of semantic visual navigation methods comparing representative methods from classical, modular, and end-to-end learning approaches across six homes with no prior experience, maps, or instrumentation. We find that modular learning works well in the real world, attaining a 90% success rate. In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality. For practitioners, we show that modular learning is a reliable approach to navigate to objects: modularity and abstraction in policy design enable Sim-to-Real transfer. For researchers, we identify two key issues that prevent today's simulators from being reliable evaluation benchmarks - (A) a large Sim-to-Real gap in images and (B) a disconnect between simulation and real-world error modes - and propose concrete steps forward.
翻译:语义导航是必要的, 以便在我们的家庭、 学校和医院等不受控制的环境中部署移动机器人。 由于对传统的空间导航管道缺乏语义上的理解, 提出了许多基于学习的视觉导航政策。 许多基于学习的视觉导航政策主要在模拟中进行了评估。 许多基于学习的视觉导航政策主要在模拟中评估。 我们提出了对典型的空间导航管道缺乏语义上的理解的多种基于学习的方法的学习方法的很多基于学习的流学方法的建议。 通过深层传感器和计划达到点目标,建立了一部几何地图,用深度传感器和计划达到点目标。 广义的端到端到端的学习方法, 将感官输入到深层神经网络的行动, 而模块式学习方法则以90%的成功率为基础。 相比之下, 端到端到端的学习方法并不丰富传统管道, 从77%的模拟到实际世界的离轨率下降到23 % 。 由于模拟和现实之间的图像域差距很大, 我们提出了对语义视觉导航方法的大规模实验研究, 我们从今天的模版学习是一个可靠的模型到模模模模模模模版到模版的模模版, 模版到模版的模版的模版的实验, 。