The task of estimating 3D occupancy from surrounding view images is an exciting development in the field of autonomous driving, following the success of Birds Eye View (BEV) perception.This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. However, there is still a lack of a baseline to define the task, such as network design, optimization, and evaluation. In this work, we present a simple attempt for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation, stereo matching, and BEV perception (3D object detection and map segmentation), which could advance the study on 3D occupancy estimation. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish a new benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets.The relevant code will be available in https://github.com/GANWANSHUI/SimpleOccupancy
翻译:通过周围视图图像估计三维占据是自动驾驶领域的一个令人兴奋的发展,其紧随鸟瞰图(BEV)感知的成功。这项任务提供了关键的驾驶环境三维属性,增强了对周围空间的整体理解和感知。然而,仍然缺乏定义该任务的基线,例如网络设计、优化和评估。本文提出了一个简单的3D占据估计方法,这是一个基于卷积神经网络的框架,旨在揭示3D占据估计的几个关键因素。此外,我们探讨了3D占据估计与其他相关任务(如单眼深度估计、立体匹配和BEV感知(三维物体检测和地图分割))之间的关系,这可能推动3D占据估计的研究。对于评估,我们提出了一种简单的采样策略来定义占据评估指标,这对于当前公共数据集是灵活的。此外,我们在DDAD和Nuscenes数据集上建立了新的基准,以深度估计指标为基础,其中我们将我们提出的方法与单眼深度估计方法进行了比较。相关的代码将在https://github.com/GANWANSHUI/SimpleOccupancy上提供。