Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss. Our experiments evidence the benefits of our approach, outperforming by 10% the only existing SDG object detection method, Single-DGOD [49], on their own diverse weather-driving benchmark.
翻译:单一域通用 (SDG) 解决了在单一源域上培训一个模型以便将其推广到任何看不见的目标域的问题。 虽然已经为图像分类进行了很好的研究,但关于SDG物体探测的文献几乎不存在。为了应对同时学习强力物体定位和代表的难题,我们提议利用一个经过预先训练的视觉语言模型,通过文字提示引入语义域概念。我们通过一个语义扩增战略,根据探测器主干网所提取的特征以及基于文本的分类损失来实现这一目标。我们的实验证明我们的方法的好处,超过现有的唯一一种SDG物体探测方法,即单一DGOD[49]的10%,以其自身不同的天气驱动基准为表现。