The task of estimating 3D occupancy from surrounding view images is an exciting development in the field of autonomous driving, following the success of Birds Eye View (BEV) perception.This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. However, there is still a lack of a baseline to define the task, such as network design, optimization, and evaluation. In this work, we present a simple attempt for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation, stereo matching, and BEV perception (3D object detection and map segmentation), which could advance the study on 3D occupancy estimation. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish a new benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets.The relevant code will be available in https://github.com/GANWANSHUI/SimpleOccupancy
翻译:Translated abstract:
从环境视图图像中估计三维占用状态是自主驾驶领域中激动人心的发展。这项任务提供了驾驶环境的关键三维属性,增强了对周围空间的整体理解和感知。然而,在定义任务上仍缺乏基线,例如网络设计、优化和评估。在这项工作中,我们提出了一个简单的三维占用估计方法,这是一个基于卷积神经网络的框架,旨在揭示三维占用估计的几个关键因素。此外,我们探讨了三维占用估计与其他相关任务的关系,如单眼深度估计、立体匹配和鸟瞰图感知(三维物体检测和地图分割),这可以推动三维占用估计的研究。为评估,我们提出了一种简单的采样策略来定义占用估计的度量,这对当前公共数据集是灵活的。此外,我们建立了一个新的深度估计度量基准,在DDAD和Nuscenes数据集上比较了我们提出的方法与单眼深度估计方法。相关代码将在https://github.com/GANWANSHUI/SimpleOccupancy上提供。