Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.
翻译:自主越野驾驶中,基于摄像头的视觉系统的有效使用对于稳健性能至关重要,特别是在高速状态下。尽管在结构化的路面场景中取得了成功,但目前的端到端方法尚未成功地适应于复杂的户外地形。为此,我们提出了TerrainNet,这是一种基于视觉的地形感知系统,用于激进的越野行驶的语义和几何地形预测。该方法依赖于实现可靠地形建模的几个关键见解和实用考虑因素。网络包括多头输出表示,以捕获估计可行性所必需的粗和细粒度地形特征。使用自监督深度完成多视角RGB和立体输入实现精确的深度估计。使用有效的学习图像特征投影满足实时性能和快速推理速度的要求。此外,该模型是在跨多个不同户外环境收集的大规模真实世界越野数据集上进行训练的。我们展示了TerrainNet如何用于成本地图预测,并提供了将其集成到计划模块中的详细框架。我们通过与当前最先进的基线方法进行广泛比较来展示TerrainNet的性能。最后,我们通过在具有挑战性的越野场景中进行真实车辆测试展示了将TerrainNet整合到完整的自动驾驶堆栈中的有效性。