3D object detection from a single image is an important task in Autonomous Driving (AD), where various approaches have been proposed. However, the task is intrinsically ambiguous and challenging as single image depth estimation is already an ill-posed problem. In this paper, we propose an instance-aware approach to aggregate useful information for improving the accuracy of 3D object detection with the following contributions. First, an instance-aware feature aggregation (IAFA) module is proposed to collect local and global features for 3D bounding boxes regression. Second, we empirically find that the spatial attention module can be well learned by taking coarse-level instance annotations as a supervision signal. The proposed module has significantly boosted the performance of the baseline method on both 3D detection and 2D bird-eye's view of vehicle detection among all three categories. Third, our proposed method outperforms all single image-based approaches (even these methods trained with depth as auxiliary inputs) and achieves state-of-the-art 3D detection performance on the KITTI benchmark.
翻译:从单一图像中检测3D对象是自主驱动(AD)的重要任务,其中提出了各种办法,然而,任务本身含混不清且具有挑战性,因为单一图像深度估计已经是一个不恰当的问题。在本文件中,我们提出一个实例认知方法,以汇总有用的信息,提高3D对象探测的准确性,并作出以下贡献。首先,提议了一个实例认知特征汇总模块,以收集3D捆绑框回归的本地和全球特征。第二,我们从经验中发现,将粗略实例说明作为监督信号,可以很好地了解空间关注模块。拟议的模块极大地提高了3D探测和2D鸟眼对所有三类车辆探测的基线方法的性能。第三,我们拟议的方法超越了所有单一图像方法(即使这些方法经过深度培训,作为辅助投入),并在KITTI基准上实现了最先进的3D检测性能。