Training a deep learning model with artificially generated data can be an alternative when training data are scarce, yet it suffers from poor generalization performance due to a large domain gap. In this paper, we characterize the domain gap by using a causal framework for data generation. We assume that the real and synthetic data have common content variables but different style variables. Thus, a model trained on synthetic dataset might have poor generalization as the model learns the nuisance style variables. To that end, we propose causal invariance learning which encourages the model to learn a style-invariant representation that enhances the syn-to-real generalization. Furthermore, we propose a simple yet effective feature distillation method that prevents catastrophic forgetting of semantic knowledge of the real domain. In sum, we refer to our method as Guided Causal Invariant Syn-to-real Generalization that effectively improves the performance of syn-to-real generalization. We empirically verify the validity of proposed methods, and especially, our method achieves state-of-the-art on visual syn-to-real domain generalization tasks such as image classification and semantic segmentation.
翻译:当培训数据缺乏时,用人工生成的数据进行深层次学习的模式可以成为替代方法,当培训数据缺乏时,这种培训可以是一种深层次的学习模式,但是由于巨大的领域差距,这种培训的普及性表现差。在本文中,我们通过使用一个因果框架来描述领域差距。我们假设,真实和合成数据具有共同的内容变量,但风格变量不同。因此,随着模型学习骚扰性风格变量,经过培训的合成数据集模型可能没有很好地概括化。为此,我们提出因果性学习,鼓励模型学习一种风格性变化的表达方式,以强化合成到现实的普及性。此外,我们提出了一个简单而有效的特性蒸馏方法,防止灾难性地忘记真实域的语义知识。总而言之,我们称我们的方法为导导引的Causal Inverist-Syn-to-Semplicalization, 有效地改进合成-Syn-to-Real-spedical化的绩效。我们的经验性地验证了拟议方法的有效性,特别是我们的方法在视觉合成合成合成合成合成至现实域域的分类和断段等一般化任务上实现了状态。