Recently, DEtection TRansformer (DETR), an end-to-end object detection pipeline, has achieved promising performance. However, it requires large-scale labeled data and suffers from domain shift, especially when no labeled data is available in the target domain. To solve this problem, we propose an end-to-end cross-domain detection transformer based on the mean teacher knowledge transfer (MTKT), which transfers knowledge between domains via pseudo labels. To improve the quality of pseudo labels in the target domain, which is a crucial factor for better domain adaptation, we design three levels of source-target feature alignment strategies based on the architecture of the Transformer, including domain query-based feature alignment (DQFA), bi-level-graph-based prototype alignment (BGPA), and token-wise image feature alignment (TIFA). These three levels of feature alignment match the global, local, and instance features between source and target, respectively. With these strategies, more accurate pseudo labels can be obtained, and knowledge can be better transferred from source to target, thus improving the cross-domain capability of the detection transformer. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on three domain adaptation scenarios, especially the result of Sim10k to Cityscapes scenario is remarkably improved from 52.6 mAP to 57.9 mAP. Code will be released.
翻译:最近,一个端到端物体探测管道(DETR)脱氧气变异器(DEKT)已经取得了有希望的性能,但是,它需要大规模标签数据,并存在域变换,特别是在目标领域没有标签数据的情况下。为了解决这个问题,我们提议基于教师知识平均转移(MTKT)的端到端跨域探测变压器,通过假标签在不同的域间转让知识。为了提高目标领域假标签的质量,这是改进领域适应的一个关键因素,我们根据变异器的结构设计了三个层次的源-目标特征调整战略,包括基于域查询的特性调整(DQFA)、基于双级图形的原型调整(BGPA)和象征性图像特征调整(TIFA)。这三个特性调整等级分别与来源和目标之间的全球、地方和实例特征匹配。有了这些战略,就可以获得更准确的假标签,而且知识可以更好地从源向目标转移,从而改进了检测变异器的跨域能力。57号码变换码器(DQQA)的跨域实验将显示我们提出的SMAAP方案的改进后结果。