Fast stereo based 3D object detectors have made great progress recently. However, they lag far behind high-precision stereo based methods in accuracy. We argue that the main reason is due to the poor geometry-aware feature representation in 3D space. To solve this problem, we propose an efficient stereo geometry network (ESGN). The key in our ESGN is an efficient geometry-aware feature generation (EGFG) module. Our EGFG module first uses a stereo correlation and reprojection module to construct multi-scale stereo volumes in camera frustum space, second employs a multi-scale BEV projection and fusion module to generate multiple geometry-aware features. In these two steps, we adopt deep multi-scale information fusion for discriminative geometry-aware feature generation, without any complex aggregation networks. In addition, we introduce a deep geometry-aware feature distillation scheme to guide stereo feature learning with a LiDAR-based detector. The experiments are performed on the classical KITTI dataset. On KITTI test set, our ESGN outperforms the fast state-of-art-art detector YOLOStereo3D by 5.14\% on mAP$_{3d}$ at 62$ms$. To the best of our knowledge, our ESGN achieves a best trade-off between accuracy and speed. We hope that our efficient stereo geometry network can provide more possible directions for fast 3D object detection. Our source code will be released.
翻译:最近,基于3D的快速立体立体探测器取得了巨大进展。然而,它们远远落后于基于高精度立体探测器的精确度方法。我们争辩说,主要原因是3D空间的几何特征表现不良。为了解决这个问题,我们建议建立一个高效的立体几何特征网络。我们的ESGN的关键是高效的几何觉特征生成模块。我们的EGGG模块首先使用立体相关性和再预测模块来在摄像系统风云空间中构建多级立体立体音量。第二,使用多尺度的BEEV投射和聚合模块来生成多重几何觉特征。在这两个步骤中,我们采用了深度的多尺度信息融合来生成有区别的大地测量特征网络。此外,我们引入了深度的几何体特征蒸馏计划来引导使用以LIDAR为基础的探测器进行立体特征学习。在古典的KITTI数据集上进行了实验。在KITTI测试组上,我们的ESG.14天天体天体投射和聚模模模模模模模模组中,我们用快速的地价数据定位定位数据定位系统定位系统定位, 3OOD 将实现我们最佳的快速定位,在5OOO的轨道定位中,我们的最佳定位中,我们最快速的轨道定位, 3OOD 的定位的定位,我们将获得最佳的轨道的轨道定位的定位的轨道定位,我们的最佳的定位, 3的定位,我们的数据。