Stereo-based 3D detection aims at detecting 3D object bounding boxes from stereo images using intermediate depth maps or implicit 3D geometry representations, which provides a low-cost solution for 3D perception. However, its performance is still inferior compared with LiDAR-based detection algorithms. To detect and localize accurate 3D bounding boxes, LiDAR-based models can encode accurate object boundaries and surface normal directions from LiDAR point clouds. However, the detection results of stereo-based detectors are easily affected by the erroneous depth features due to the limitation of stereo matching. To solve the problem, we propose LIGA-Stereo (LiDAR Geometry Aware Stereo Detector) to learn stereo-based 3D detectors under the guidance of high-level geometry-aware representations of LiDAR-based detection models. In addition, we found existing voxel-based stereo detectors failed to learn semantic features effectively from indirect 3D supervisions. We attach an auxiliary 2D detection head to provide direct 2D semantic supervisions. Experiment results show that the above two strategies improved the geometric and semantic representation capabilities. Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10.44%, 5.69%, 5.97% mAP respectively on the official KITTI benchmark. The gap between stereo-based and LiDAR-based 3D detectors is further narrowed.
翻译:以立体雷达为基础的立体雷达检测旨在利用中间深度地图或隐含的立体深度图或3D几何图解,探测3D物体从立体图像中捆绑盒,这为3D感知提供了低成本的解决方案。然而,其性能仍低于基于立体雷达的检测算法。为检测和定位精确的立体成像框,基于立体雷达的模型可以将基于立体雷达点云的精确天体界限和表面正常方向编码。然而,基于立体探测器的检测结果很容易受到由于立体匹配限制而造成的错误深度特征的影响。为解决这一问题,我们提议LIGA-Stereo(LiDAR测地学探测器)在基于立体雷达的高级几度测算和立体仪检测算模型的指导下学习立体3D探测器。此外,我们发现基于立体雷达的立体探测器无法从间接的3D级监控中有效地了解语体特征。我们附上一个基于立体探测器的辅助性2D检测头,以提供直接的2D语义监督。实验结果表明,以上两项战略在5-D级测地-D测距和测距轨道3D探测器之间分别改进了5-D的测距和测距方法。