Using synthetic data for training neural networks that achieve good performance on real-world data is an important task as it can reduce the need for costly data annotation. Yet, synthetic and real world data have a domain gap. Reducing this gap, also known as domain adaptation, has been widely studied in recent years. Closing the domain gap between the source (synthetic) and target (real) data by directly performing the adaptation between the two is challenging. In this work, we propose a novel two-stage framework for improving domain adaptation techniques on image data. In the first stage, we progressively train a multi-scale neural network to perform image translation from the source domain to the target domain. We denote the new transformed data as "Source in Target" (SiT). Then, we insert the generated SiT data as the input to any standard UDA approach. This new data has a reduced domain gap from the desired target domain, which facilitates the applied UDA approach to close the gap further. We emphasize the effectiveness of our method via a comparison to other leading UDA and image-to-image translation techniques when used as SiT generators. Moreover, we demonstrate the improvement of our framework with three state-of-the-art UDA methods for semantic segmentation, HRDA, DAFormer and ProDA, on two UDA tasks, GTA5 to Cityscapes and Synthia to Cityscapes.
翻译:利用合成数据培训在真实世界数据上取得良好业绩的神经网络培训神经网络,这是一项重要的任务,因为它可以减少对昂贵数据说明的需求。然而,合成和真实世界数据存在一个领域差距。近年来,对缩小这一差距(又称域适应)进行了广泛研究。通过直接进行两个领域之间的调整,缩小源(合成)与目标(真实)数据之间的域差距是一项挑战。在这项工作中,我们提出了改进图像数据领域适应技术的新颖的两阶段框架。在第一阶段,我们逐步培训一个从源域到目标域进行图像转换的多级神经网络。我们将新的变换数据称为“目标来源”(SiT),然后,我们将生成的SiT数据作为任何标准的UDA方法的投入。这个新数据减少了与预期目标域的域之间的域差距,从而便利了应用UDA方法来进一步缩小差距。我们通过与其他领先的UDA和图像到目标域域翻译技术的图像转换技术进行比较,我们用SiTA作为SiFA 和SIDA结构,我们改进了我们用于SIFORDA的三部任务。