现实世界中的半超域普遍化:新基准和强强基线 (Semi-Supervised Domain Generalization in Real World:New Benchmark and Strong Baseline)

Conventional domain generalization aims to learn domain invariant representation from multiple domains, which requires accurate annotations. In realistic application scenarios, however, it is too cumbersome or even infeasible to collect and annotate the large mass of data. Yet, web data provides a free lunch to access a huge amount of unlabeled data with rich style information that can be harnessed to augment domain generalization ability. In this paper, we introduce a novel task, termed as semi-supervised domain generalization, to study how to interact the labeled and unlabeled domains, and establish two benchmarks including a web-crawled dataset, which poses a novel yet realistic challenge to push the limits of existing technologies. To tackle this task, a straightforward solution is to propagate the class information from the labeled to the unlabeled domains via pseudo labeling in conjunction with domain confusion training. Considering narrowing domain gap can improve the quality of pseudo labels and further advance domain invariant feature learning for generalization, we propose a cycle learning framework to encourage the positive feedback between label propagation and domain generalization, in favor of an evolving intermediate domain bridging the labeled and unlabeled domains in a curriculum learning manner. Experiments are conducted to validate the effectiveness of our framework. It is worth highlighting that web-crawled data benefits domain generalization as demonstrated in our results. Our code will be available later.

翻译：常规域一般化的目的是从多个域中学习域的变异代表,这需要准确的注释。但在现实的应用假设中,太繁琐甚至不可行,无法收集和说明大量数据。然而,网络数据提供了免费午餐,可以免费获取大量无标签数据,其中含有丰富的风格信息,可以用来增强域一般化能力。在本文中,我们引入了一个新的任务,称为半监督域一般化,研究如何互动标签和无标签域间的互动,并设定了两个基准,包括一个网版数据集,这对推动现有技术的局限性提出了新颖而现实的挑战。要完成这项任务,一个直接的解决办法是通过假标签和域混淆培训,将标签和无标签的数据传播到无标签域间的大量信息。考虑缩小域间差距可以提高假标签的质量和进一步推进异变域特性学习,以便概括化,我们提议一个周期学习框架,鼓励标签传播和域域域间的积极反馈,有利于不断演变的中间域间缩小标签和无标签和无标签通用的域域间连接现有技术的界限。为了完成这一任务,一个直接的解决办法就是通过假化的域域域域内学习结果。