Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the appearance of the target domain. Departing from the common notion of transferring only the target ``texture'' information, we leverage text-to-image diffusion models (e.g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts. The text interface in our method Data AugmenTation with diffUsion Models (DATUM) endows us with the possibility of guiding the generation of images towards desired semantic concepts while respecting the original spatial context of a single training image, which is not possible in existing OSUDA methods. Extensive experiments on standard benchmarks show that our DATUM surpasses the state-of-the-art OSUDA methods by up to +7.1%. The implementation is available at https://github.com/yasserben/DATUM
翻译:适应来自标注源域的分割模型到目标域,其中只有一个未标注数据是可用的,是领域自适应中最具挑战性的问题,也称为一次性无监督领域自适应(OSUDA)。大多数之前的作品通过依赖样式转移技术来解决这个问题,其中源图像被风格化以拥有目标域的外观。与仅传输目标“纹理”信息的常见观点不同,我们利用文本到图像扩散模型(例如,稳定扩散)将生成具有照片般逼真图像的合成目标数据集,这些图像不仅忠实地描绘目标域的风格,而且被多样情境中的新场景所特征化。我们方法数据扩充与扩散模型(DATUM)中的文本接口使我们有可能在遵守单个训练图像的原始空间上下文的同时,将图像生成定向到所需的语义概念,这在现有的OSUDA方法中是不可能的。广泛的基准实验表明,我们的DATUM超过了最先进的OSUDA方法高达+ 7.1%。该实现可在https://github.com/yasserben/DATUM上获得。