In this paper we consider the task of semantic segmentation in autonomous driving applications. Specifically, we consider the cross-domain few-shot setting where training can use only few real-world annotated images and many annotated synthetic images. In this context, aligning the domains is made more challenging by the pixel-wise class imbalance that is intrinsic in the segmentation and that leads to ignoring the underrepresented classes and overfitting the well represented ones. We address this problem with a novel framework called Pixel-By-Pixel Cross-Domain Alignment (PixDA). We propose a novel pixel-by-pixel domain adversarial loss following three criteria: (i) align the source and the target domain for each pixel, (ii) avoid negative transfer on the correctly represented pixels, and (iii) regularize the training of infrequent classes to avoid overfitting. The pixel-wise adversarial training is assisted by a novel sample selection procedure, that handles the imbalance between source and target data, and a knowledge distillation strategy, that avoids overfitting towards the few target images. We demonstrate on standard synthetic-to-real benchmarks that PixDA outperforms previous state-of-the-art methods in (1-5)-shot settings.
翻译:在本文中,我们考虑了自主驱动应用程序中的语义分解任务。 具体地说, 我们考虑跨域的点数设置, 使培训只能使用几个真实世界的附加说明的图像和许多附加说明的合成图像。 在这方面, 通过分解中固有的像素- 分级失衡, 从而导致忽略代表性不足的类别, 并过度适应代表良好的类别, 使得对域的对齐更具挑战性。 我们用一个叫做 Pixel- By- Pixel- Pixel 交叉- Domain 匹配( PixDA) 的新颖的框架来解决这个问题。 我们提出一个新的像素- 逐个像素域域的对抗性损失, 遵循三个标准:(一) 将每个像素的来源和目标域对齐, (二) 避免正确代表的像素的负转移, (三) 规范对不定期班的培训以避免过度调整。 比素- 对抗性培训得到一个新的样本选择程序的协助, 处理源和目标数据之间的不平衡, 和知识蒸馏战略, 避免过度适应于少数目标图像。 我们用标准合成- Pix- forma- forma- s