This work tackles scene understanding for outdoor robotic navigation, solely relying on images captured by an on-board camera. Conventional visual scene understanding interprets the environment based on specific descriptive categories. However, such a representation is not directly interpretable for decision-making and constrains robot operation to a specific domain. Thus, we propose to segment egocentric images directly in terms of how a robot can navigate in them, and tailor the learning problem to an autonomous navigation task. Building around an image segmentation network, we present a generic affordance consisting of 3 driveability levels which can broadly apply to both urban and off-road scenes. By encoding these levels with soft ordinal labels, we incorporate inter-class distances during learning which improves segmentation compared to standard "hard" one-hot labelling. In addition, we propose a navigation-oriented pixel-wise loss weighting method which assigns higher importance to safety-critical areas. We evaluate our approach on large-scale public image segmentation datasets ranging from sunny city streets to snowy forest trails. In a cross-dataset generalization experiment, we show that our affordance learning scheme can be applied across a diverse mix of datasets and improves driveability estimation in unseen environments compared to general-purpose, single-dataset segmentation.
翻译:这项工作解决了户外机器人导航的现场理解, 仅依靠在机上摄取的图像。 常规视觉现场理解根据具体的描述性类别解释环境。 但是, 这样的表达方式不能直接解释决策, 并且将机器人操作限制在特定领域。 因此, 我们建议直接将自我中心图像分割为机器人如何在其中导航, 并将学习问题调整为自主导航任务。 围绕图像分割网络, 我们展示了一种通用的发价, 由3个驱动性水平构成, 可以广泛适用于城市和场外的场景。 通过将这些水平与软或地标相匹配, 我们引入了学习过程中的阶级间距离, 与标准的“ 硬” 单热标签相比, 能够改善分割。 此外, 我们提议了一种面向导航的像素误差加权法, 赋予安全关键地区更高的重要性。 我们评估了我们关于大型公共图像分割数据集的方法, 范围从晴朗的城市街道到雪质的森林轨迹。 在交叉数据化实验中, 我们显示, 我们的承受力学习计划可以适用于不同驱动力的组合, 将数据组合改进整个数据结构。