Monocular 3D object detection (Mono3D) has achieved tremendous improvements with emerging large-scale autonomous driving datasets and the rapid development of deep learning techniques. However, caused by severe domain gaps (e.g., the field of view (FOV), pixel size, and object size among datasets), Mono3D detectors have difficulty in generalization, leading to drastic performance degradation on unseen domains. To solve these issues, we combine the position-invariant transform and multi-scale training with the pixel-size depth strategy to construct an effective unified camera-generalized paradigm (CGP). It fully considers discrepancies in the FOV and pixel size of images captured by different cameras. Moreover, we further investigate the obstacle in quantitative metrics when cross-dataset inference through an exhaustive systematic study. We discern that the size bias of prediction leads to a colossal failure. Hence, we propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment. Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme even without utilizing data on the target domain.
翻译:单体3D对象探测(Mono3D)取得了巨大的改进,出现了大规模自主驱动驱动数据集和深层学习技术的迅速发展,然而,由于严重的领域差距(例如视野领域(FOV)、像素大小和数据集中的物体大小),单体3D探测器难以概括化,导致无形域的性能急剧退化。为了解决这些问题,我们将定位异质变换和多尺度培训与像素尺寸深度战略结合起来,以构建一个有效的统一相机通用模式(CGP),它充分考虑到不同相机所摄图像的FOV和像素大小的差异。此外,我们还通过详尽的系统研究,进一步调查在交叉数据推断时在定量指标方面的障碍。我们发现,预测的大小偏差会导致巨大失灵。因此,我们建议2D-3D地球测量相容对象缩放战略(GCOS)通过实例级增强来缩小差距。我们称为DGMono3D的方法甚至利用了所有评估数据设置和超域域域的显著性能,而没有利用所有评估的数据调整方案。