Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors because there is a large performance gap between image-based and LiDAR-based methods. It is caused by the way to form representation for the prediction in 3D scenarios. Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap by detecting 3D objects on a differentiable volumetric representation -- 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. With this representation, we learn depth information and semantic cues simultaneously. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline that jointly estimates the depth and detects 3D objects in an end-to-end learning manner. Our approach outperforms previous stereo-based 3D detectors (about 10 higher in terms of AP) and even achieves comparable performance with several LiDAR-based methods on the KITTI 3D object detection leaderboard. Our code is publicly available at https://github.com/chenyilun95/DSGN.
翻译:大多数最先进的三维天体探测器都严重依赖三维天体探测器,因为基于图像的方法和基于三维天体雷达的方法之间存在巨大的性能差距,而之所以产生这种差距,是因为在三维假设情景中形成预测的表示方式。我们的方法称为深立体几何网络(DSGN),在一个不同的体积代表面上探测三维天体,即3D几何体体积,有效地将三维常规空间的三维几何结构编码起来。我们同时学习深度信息和语义提示。我们第一次提供了一个简单而有效的一级立体立体探测管道,以端到端学习的方式共同估计和探测三维天体体体体。我们的方法比以前立体立体3D探测器(AP值约10倍以上)大得多,甚至在KITTI 3D物体探测头板上以若干立立立体体体体体体仪为基础的方法取得类似的性能。我们的代码在https://github.com/chenyil95/DSDSGN上公开提供。