Data augmentation is a key component of CNN based image recognition tasks like object detection. However, it is relatively less explored for 3D object detection. Many standard 2D object detection data augmentation techniques do not extend to 3D box. Extension of these data augmentations for 3D object detection requires adaptation of the 3D geometry of the input scene and synthesis of new viewpoints. This requires accurate depth information of the scene which may not be always available. In this paper, we evaluate existing 2D data augmentations and propose two novel augmentations for monocular 3D detection without a requirement for novel view synthesis. We evaluate these augmentations on the RTM3D detection model firstly due to the shorter training times . We obtain a consistent improvement by 4% in the 3D AP (@IoU=0.7) for cars, ~1.8% scores 3D AP (@IoU=0.25) for pedestrians & cyclists, over the baseline on KITTI car detection dataset. We also demonstrate a rigorous evaluation of the mAP scores by re-weighting them to take into account the class imbalance in the KITTI validation dataset.
翻译:增强数据是CNN基于图像识别任务的关键组成部分,如物体探测。然而,在3D对象探测中,数据增强是CNN基于图像识别任务的关键组成部分。许多标准 2D 对象探测数据增强技术没有扩大到3D框。为3D对象探测扩展这些数据增强要求调整输入场景的3D几何和合成新观点。这需要准确的现场深度信息,但不一定总能找到这些信息。在本文中,我们评估现有的2D数据增强,并提议两种新的增强功能,用于单眼3D探测,而不需要新颖的视图合成。我们首先评估RTM3D探测模型的这些增强,因为培训时间较短。我们从3D AP (@IoU=0.7) 的3D AP (@IoU=0.7) 汽车获得4%的一致改进, 3D AP (@IOU=0.25) 对行人和骑自行车者来说,比KITTI 汽车探测数据集的基线要高出0.25。我们还展示了对 mAP 评分的严格评价,通过重新加权来考虑KITTITI校准数据设置中的阶级不平衡。