Unsupervised Domain Adaptive Object Detection (UDA-OD) uses unlabelled data to improve the reliability of robotic vision systems in open-world environments. Previous approaches to UDA-OD based on self-training have been effective in overcoming changes in the general appearance of images. However, shifts in a robot's deployment environment can also impact the likelihood that different objects will occur, termed class distribution shift. Motivated by this, we propose a framework for explicitly addressing class distribution shift to improve pseudo-label reliability in self-training. Our approach uses the domain invariance and contextual understanding of a pre-trained joint vision and language model to predict the class distribution of unlabelled data. By aligning the class distribution of pseudo-labels with this prediction, we provide weak supervision of pseudo-label accuracy. To further account for low quality pseudo-labels early in self-training, we propose an approach to dynamically adjust the number of pseudo-labels per image based on model confidence. Our method outperforms state-of-the-art approaches on several benchmarks, including a 4.7 mAP improvement when facing challenging class distribution shift.
翻译:未经监督的域适应对象检测(UDA-OD)使用未贴标签的数据来提高开放世界环境中机器人视觉系统的可靠性。以前基于自我培训的UDA-OD方法有效地克服了图像一般外观的变化。然而,机器人部署环境的变化也会影响不同对象发生的可能性,称为类分配转移。我们为此提出了一个框架,明确处理类分配转移,以提高自培训中的伪标签可靠性。我们的方法使用预先训练的联合视觉和语言模型的域性和背景理解来预测未贴标签数据的类分布情况。我们通过将伪标签的类分配与这一预测相匹配,我们对伪标签准确性进行了薄弱的监督。为了进一步说明自我培训早期的低质量伪标签,我们建议了一种基于模型信心动态调整每个图像伪标签数量的方法。我们的方法比几个基准的状态和最新方法要好,包括面临挑战性类分配转移时的4.7 mAP改进。