Generalizing visual recognition models trained on a single distribution to unseen input distributions (i.e. domains) requires making them robust to superfluous correlations in the training set. In this work, we achieve this goal by altering the training images to simulate new domains and imposing consistent visual attention across the different views of the same sample. We discover that the first objective can be simply and effectively met through visual corruptions. Specifically, we alter the content of the training images using the nineteen corruptions of the ImageNet-C benchmark and three additional transformations based on Fourier transform. Since these corruptions preserve object locations, we propose an attention consistency loss to ensure that class activation maps across original and corrupted versions of the same training sample are aligned. We name our model Attention Consistency on Visual Corruptions (ACVC). We show that ACVC consistently achieves the state of the art on three single-source domain generalization benchmarks, PACS, COCO, and the large-scale DomainNet.
翻译:将经过培训的视觉识别模型推广为向无形投入分布的单一分布(即域),要求这些模型对培训集中的多余相关性具有很强的关联性。在这项工作中,我们通过将培训图像转换为模拟新领域和对同一样本的不同观点进行一致的视觉关注来实现这一目标。我们发现第一个目标可以通过视觉腐败简单而有效地实现。具体地说,我们利用图像网-C基准的19种腐败和基于Fourier变换的另外3种变换来改变培训图像的内容。由于这些腐败保存了目标位置,我们建议注意一致性损失,以确保同一培训样本的原始版本和腐败版本的班级激活地图保持一致。我们命名了我们视觉腐败关注度模型(ACVC)。我们显示,ACVC始终在三个单一源域通用基准(PACS、COCO和大型域网)上达到艺术水平。