Vision-Centric Bird-Eye-View (BEV) perception has shown promising potential and attracted increasing attention in autonomous driving. Recent works mainly focus on improving efficiency or accuracy but neglect the domain shift problem, resulting in severe degradation of transfer performance. With extensive observations, we figure out the significant domain gaps existing in the scene, weather, and day-night changing scenarios and make the first attempt to solve the domain adaption problem for multi-view 3D object detection. Since BEV perception approaches are usually complicated and contain several components, the domain shift accumulation on multi-latent spaces makes BEV domain adaptation challenging. In this paper, we propose a novel Multi-level Multi-space Alignment Teacher-Student ($M^{2}ATS$) framework to ease the domain shift accumulation, which consists of a Depth-Aware Teacher (DAT) and a Multi-space Feature Aligned (MFA) student model. Specifically, DAT model adopts uncertainty guidance to sample reliable depth information in target domain. After constructing domain-invariant BEV perception, it then transfers pixel and instance-level knowledge to student model. To further alleviate the domain shift at the global level, MFA student model is introduced to align task-relevant multi-space features of two domains. To verify the effectiveness of $M^{2}ATS$, we conduct BEV 3D object detection experiments on four cross domain scenarios and achieve state-of-the-art performance (e.g., +12.6% NDS and +9.1% mAP on Day-Night). Code and dataset will be released.
翻译:视觉- 视觉- 视觉- 视觉- 鸟类- Eye- View (BEV) 的感知显示了充满希望的潜力,并在自主驱动中吸引了越来越多的注意力。最近的工作主要侧重于提高效率或准确性,但忽视了域变问题,导致转移性能严重退化。通过广泛的观察,我们找出了现场、天气和日间变化情景中存在的显著领域差距,并首次试图解决多视3D对象探测的域调整问题。由于BEV的感知方法通常很复杂,包含若干组成部分,多纬度空间的域变换积累使得BEV域适应具有挑战性。在本文件中,我们提议建立一个新的多级多空间调整师资(M*2}AT$9AT$)框架,以缓解域变换累积性累积性,其中包括一个深度- 软件师(DAT) 和多空间变异性(MAFA) 的多域域域域域域域域变换(MFA- 6) 和四个域域域域域变校际变(MA- 4) 域域域域域域变校正) 校验(MFAFA- 4)