Despite the remarkable performance that modern deep neural networks have achieved on independent and identically distributed (I.I.D.) data, they can crash under distribution shifts. Most current evaluation methods for domain generalization (DG) adopt the leave-one-out strategy as a compromise on the limited number of domains. We propose a large-scale benchmark with extensive labeled domains named NICO++ along with more rational evaluation methods for comprehensively evaluating DG algorithms. To evaluate DG datasets, we propose two metrics to quantify covariate shift and concept shift, respectively. Two novel generalization bounds from the perspective of data construction are proposed to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization. Through extensive experiments, NICO++ shows its superior evaluation capability compared with current DG datasets and its contribution in alleviating unfairness caused by the leak of oracle knowledge in model selection.
翻译:尽管现代深层神经网络在独立和相同分布(I.I.D.)数据上取得了显著的成绩,但它们可能崩溃在分布变化中。目前对域的通用化(DG)的大多数评价方法都采用一对一战略作为范围有限的折衷方案。我们建议采用一个大尺度的基准,包括称为NICO++的广泛标签域以及全面评价DG算法的更合理的评价方法。为了评价DG数据集,我们分别提出两个衡量标准,以量化共变转移和概念转移。从数据构建角度提出的两个新颖的通用界限是为了证明有限的概念转变和重大的共变换转移有利于评价普及能力。通过广泛的实验,NICO++显示其优于现有的DG数据集的评价能力,以及它有助于减轻模型选择中奥克莱知识泄漏造成的不公平现象。