Despite the remarkable performance that modern deep neural networks have achieved on independent and identically distributed (I.I.D.) data, they can crash under distribution shifts. Most current evaluation methods for domain generalization (DG) adopt the leave-one-out strategy as a compromise on the limited number of domains. We propose a large-scale benchmark with extensive labeled domains named NICO++{\ddag} along with more rational evaluation methods for comprehensively evaluating DG algorithms. To evaluate DG datasets, we propose two metrics to quantify covariate shift and concept shift, respectively. Two novel generalization bounds from the perspective of data construction are proposed to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization. Through extensive experiments, NICO++ shows its superior evaluation capability compared with current DG datasets and its contribution in alleviating unfairness caused by the leak of oracle knowledge in model selection.
翻译:尽管现代深层神经网络在独立和相同分布(I.I.D.)数据上取得了显著的成绩,但它们可能崩溃在分布变化中。目前对域的通用化(DG)的大多数评价方法都采用一对一战略作为有限领域的折衷方案。我们建议采用一个大尺度的基准,包括有广泛标签的域名为NICO ⁇ ddag},以及全面评价DG算法的更合理的评价方法。为了评价DG数据集,我们建议采用两个衡量尺度,分别量化共变换和概念转换。从数据构建的角度提出了两个新的通用界限,以证明有限的概念转变和重大的共变换转移有利于评价普及能力。通过广泛的实验,NICO+++显示其优于当前DG数据集的评价能力,以及它有助于减轻模型选择中知识泄漏造成的不公平现象。