We present Visual Navigation and Locomotion over obstacles (ViNL), which enables a quadrupedal robot to navigate unseen apartments while stepping over small obstacles that lie in its path (e.g., shoes, toys, cables), similar to how humans and pets lift their feet over objects as they walk. ViNL consists of: (1) a visual navigation policy that outputs linear and angular velocity commands that guides the robot to a goal coordinate in unfamiliar indoor environments; and (2) a visual locomotion policy that controls the robot's joints to avoid stepping on obstacles while following provided velocity commands. Both the policies are entirely "model-free", i.e. sensors-to-actions neural networks trained end-to-end. The two are trained independently in two entirely different simulators and then seamlessly co-deployed by feeding the velocity commands from the navigator to the locomotor, entirely "zero-shot" (without any co-training). While prior works have developed learning methods for visual navigation or visual locomotion, to the best of our knowledge, this is the first fully learned approach that leverages vision to accomplish both (1) intelligent navigation in new environments, and (2) intelligent visual locomotion that aims to traverse cluttered environments without disrupting obstacles. On the task of navigation to distant goals in unknown environments, ViNL using just egocentric vision significantly outperforms prior work on robust locomotion using privileged terrain maps (+32.8% success and -4.42 collisions per meter). Additionally, we ablate our locomotion policy to show that each aspect of our approach helps reduce obstacle collisions. Videos and code at http://www.joannetruong.com/projects/vinl.html
翻译:我们展示了视觉导航和旋转障碍(VinNL),它使四倍机器人能够导航看不见的公寓,同时跨越其道路上的小障碍(例如鞋、玩具、电缆),类似于人类和宠物走路时在物体上举脚。 VinNL包括:(1) 视觉导航政策,它输出线性和角速度指令,引导机器人在不熟悉的室内环境中达成目标协调;(2) 视觉移动政策,控制机器人的接合点,以避免在提供速度指令的同时踩在障碍上。两种政策都是完全“没有模型”的,即传感器对动作神经神经网络进行端对终端至终端的训练。两者都是在两个完全不同的模拟器中独立训练的,然后通过将导航器的速度指令从导航器输入到叶科托托托尔,完全“零发” (无需任何连线培训) 。尽管先前的工作已经开发了视觉导航或视觉透视技术的学习方法,我们最清楚了解了。(2) 智能的视觉定位环境在不理解性定位环境上展示了我们之前的视觉定位和视觉上的目标。(2) 在智能定位环境上展示过程中,每个视觉环境上都展示了一种智能定位,目的是的视觉定位,目的是的定位环境的目的是要显示我们既能,既能环境,又展示,又展示了一种稳定的导航障碍。