While several methodologies have been proposed for the daunting task of domain generalization, understanding what makes this task challenging has received little attention. Here we present SemanticDG (Semantic Domain Generalization): a benchmark with 15 photo-realistic domains with the same geometry, scene layout and camera parameters as the popular 3D ScanNet dataset, but with controlled domain shifts in lighting, materials, and viewpoints. Using this benchmark, we investigate the impact of each of these semantic shifts on generalization independently. Visual recognition models easily generalize to novel lighting, but struggle with distribution shifts in materials and viewpoints. Inspired by human vision, we hypothesize that scene context can serve as a bridge to help models generalize across material and viewpoint domain shifts and propose a context-aware vision transformer along with a contrastive loss over material and viewpoint changes to address these domain shifts. Our approach (dubbed as CDCNet) outperforms existing domain generalization methods by over an 18% margin. As a critical benchmark, we also conduct psychophysics experiments and find that humans generalize equally well across lighting, materials and viewpoints. The benchmark and computational model introduced here help understand the challenges associated with generalization across domains and provide initial steps towards extrapolation to semantic distribution shifts. We include all data and source code in the supplement.
翻译:虽然已经为广域化这一艰巨任务提出了若干方法,但了解是什么使这项任务具有挑战性却很少引起注意。这里我们介绍的是SemanticDG(Semantic DG(Semantic Domic Generalization):一个基准,有15个摄影现实域,与流行的 3D ScanNet 数据集具有相同的几何、场景布局和摄像参数,但与流行的 3D ScanNet 数据集具有对照性的域变化。我们利用这个基准,调查这些语义变化对独立概括化的影响。视觉识别模型很容易向新的照明推广,但与材料和观点的分布变化作斗争。在人类的视野的启发下,我们假设场景环境环境环境可以起到桥梁的作用,帮助模型在材料和观点的域域变化中形成通用的模型和视野,同时提出一个环境觉悟化的视觉变变器,同时对材料和观点产生对比性损失,以应对这些变化。我们的方法(以CDCNet)比现有的广域变法高出18%。作为一个关键基准,我们还进行心理物理实验,发现人类在各种照明、材料和观点上的分布环境环境环境环境环境环境环境环境环境环境环境变化。 我们在这里提供基准和计算学的模型的模型和模型的模型和模型的模型的模型和模型的模型,以了解所有。