Existing deep learning-based approaches for monocular 3D object detection in autonomous driving often model the object as a rotated 3D cuboid while the object's geometric shape has been ignored. In this work, we propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework. Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain and regress their corresponding 3D coordinates in the local 3D object coordinate first. Then the 2D/3D geometric constraints are built by these correspondences for each object to boost the detection performance. For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed by fitting the deformed 3D object model and the object mask in the 2D image. The proposed framework has been verified on the public KITTI dataset and the experimental results demonstrate that by using additional geometrical constraints the detection performance has been significantly improved as compared to the baseline method. More importantly, the proposed framework achieves state-of-the-art performance with real time. Data and code will be available at https://github.com/zongdai/AutoShape
翻译:在自主驱动中,单立体3D物体探测的现有深层学习方法往往将物体模拟为旋转的3D小块,而该物体的几何形状则被忽略。在这项工作中,我们提议了一种方法,将形状觉悟的2D/3D限制纳入3D探测框架。具体地说,我们利用深神经网络学习2D图像域中区别的2D关键点,并首先将相应的3D坐标移到当地3D对象坐标中。随后,这些对应方法为每个物体构建了2D/3D几何限制,以提高探测性能。为生成2D/3D关键点的地面真相,我们提议了一个自动模型适应方法,将变形的3D对象模型模型和2D图像中的物体掩码安装为2D。提议的框架已在公众的KITTI数据集上得到验证,实验结果显示,通过使用额外的几何限制,探测性能与基线方法相比已大大改进。更重要的是,拟议的框架将实时实现该艺术状态性能。数据和代码将在 https://Giustobus/Shatubtailat。