3D detection plays an indispensable role in environment perception. Due to the high cost of commonly used LiDAR sensor, stereo vision based 3D detection, as an economical yet effective setting, attracts more attention recently. For these approaches based on 2D images, accurate depth information is the key to achieve 3D detection, and most existing methods resort to a preliminary stage for depth estimation. They mainly focus on the global depth and neglect the property of depth information in this specific task, namely, sparsity and locality, where exactly accurate depth is only needed for these 3D bounding boxes. Motivated by this finding, we propose a stereo-image based anchor-free 3D detection method, called structure-aware stereo 3D detector (termed as SIDE), where we explore the instance-level depth information via constructing the cost volume from RoIs of each object. Due to the information sparsity of local cost volume, we further introduce match reweighting and structure-aware attention, to make the depth information more concentrated. Experiments conducted on the KITTI dataset show that our method achieves the state-of-the-art performance compared to existing methods without depth map supervision.
翻译:3D探测在环境认知中发挥着不可或缺的作用。 由于常用的3DAR传感器、立体视像检测、作为一种经济但有效的环境,成本高昂,最近吸引了更多关注。对于基于2D图像的这些方法,准确的深度信息是实现3D探测的关键,大多数现有方法都采用初步的深度评估阶段。这些方法主要侧重于全球深度,忽视了这一具体任务中的深度信息属性,即:宽度和位置,只有这些3D捆绑框才需要准确的深度。根据这一发现,我们提出了一种基于立体图像的无锚3D探测方法,称为“3DSIDE”,我们通过从每个物体的RoIs构建成本量来探索实例深度信息。由于当地成本量的信息紧张性,我们进一步引入了匹配的加权和结构意识,以使深度信息更加集中。在KITTI数据集上进行的实验表明,我们的方法实现了与现有深度地图上没有监督的方法相比的状态。