Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors and there remains a large gap in terms of performance between image-based and LiDAR-based methods, caused by inappropriate representation for the prediction in 3D scenarios. Our method, called Deep Stereo Geometry Network (DSGN), reduces this gap significantly by detecting 3D objects on a differentiable volumetric representation -- 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. With this representation, we learn depth information and semantic cues simultaneously. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline that jointly estimates the depth and detects 3D objects in an end-to-end learning manner. Our approach outperforms previous stereo-based 3D detectors (about 10 higher in terms of AP) and even achieves comparable performance with a few LiDAR-based methods on the KITTI 3D object detection leaderboard. Code will be made publicly available.
翻译:大多数最先进的三维天体探测器都严重依赖三维天体传感器,而且由于在三维情景中不适当地进行预测,基于图像的方法和基于激光雷达的方法之间在性能方面仍存在巨大差距。我们称为深立体几何网络(DSGN)的方法大大缩小了这一差距,在不同的体积代表面上探测了三维天体 -- -- 3D几何体体体积,它有效地将三维常规空间的3D几何结构编码起来。我们同时学习了深度信息和语义提示。我们第一次提供了一个简单而有效的、基于一阶段立体立体探测管道,以端到端学习的方式联合估计深度并探测三维天体体体物体。我们的方法比以前立体立体3D探测器(在AP方面大约10个以上)要好一些,甚至用在KITTI 3D目的物体探测引导板上的一些基于立体体体体探测方法取得类似的性能。代码将公开公布。