Monocular 3D object detection (Mono3D) has achieved unprecedented success with the advent of deep learning techniques and emerging large-scale autonomous driving datasets. However, drastic performance degradation remains an unwell-studied challenge for practical cross-domain deployment as the lack of labels on the target domain. In this paper, we first comprehensively investigate the significant underlying factor of the domain gap in Mono3D, where the critical observation is a depth-shift issue caused by the geometric misalignment of domains. Then, we propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D. To mitigate the depth-shift, we introduce the geometry-aligned multi-scale training strategy to disentangle the camera parameters and guarantee the geometry consistency of domains. Based on this, we develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain. Benefiting from the end-to-end framework that provides richer information of the pseudo labels, we propose the quality-aware supervision strategy to take instance-level pseudo confidences into account and improve the effectiveness of the target-domain training process. Moreover, the positive focusing training strategy and dynamic threshold are proposed to handle tremendous FN and FP pseudo samples. STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset. To the best of our knowledge, this is the first study to explore effective UDA methods for Mono3D.
翻译:单体 3D 对象探测( Mono3D) 取得了前所未有的成功, 有了深层次的学习技巧和新兴的大型自主驱动数据集, 出现了前所未有的自主驱动数据集。 然而, 急剧性能退化对于实际的跨域部署仍是一个未得到充分研究的挑战, 因为目标域没有标签。 在本文中, 我们首先全面调查单体 3D 域间差距的重要基本因素, 关键观测是域间几何差错导致的深度移位问题 。 然后, 我们提出 ST Mono3D, 一个新的自我教学框架, 用于对单体3D 进行不受监督的域适应。 为了减轻深度变换, 我们引入了与几何调一致的多尺度培训战略, 以分散相机参数参数参数参数, 保证域间地理测量的一致性。 我们开发了教师- 样板模式, 在目标域内生成适应性假标签。 从提供更丰富的假标签的端对端框架中受益, 我们提出质量认知监督战略, 将真实性标度目标域3 测试结果考虑在内, 并改进了目标域域域域域域域域域域域域域域域域域域域内的所有数据测试结果 。