Deep learning has achieved tremendous success with independent and identically distributed (i.i.d.) data. However, the performance of neural networks often degenerates drastically when encountering out-of-distribution (OoD) data, i.e., training and test data are sampled from different distributions. While a plethora of algorithms has been proposed to deal with OoD generalization, our understanding of the data used to train and evaluate these algorithms remains stagnant. In this work, we position existing datasets and algorithms from various research areas (e.g., domain generalization, stable learning, invariant risk minimization) seemingly unconnected into the same coherent picture. First, we identify and measure two distinct kinds of distribution shifts that are ubiquitous in various datasets. Next, we compare various OoD generalization algorithms with a new benchmark dominated by the two distribution shifts. Through extensive experiments, we show that existing OoD algorithms that outperform empirical risk minimization on one distribution shift usually have limitations on the other distribution shift. The new benchmark may serve as a strong foothold that can be resorted to by future OoD generalization research.
翻译:然而,神经网络的性能在遇到分配外(OoD)数据时往往会急剧退化,也就是说,培训和测试数据是从不同的分布中抽样的。虽然提出了大量的算法来处理OOD一般化问题,但我们对这些算法进行训练和评价时所使用的数据的理解仍然停滞不前。在这项工作中,我们从各种研究领域(例如,域的概括化、稳定的学习、不变化的风险最小化)将现有的数据集和算法看似没有连接到同一连贯的图象中。首先,我们确定和衡量两种不同的分布变化,这些变化在不同的数据集中普遍存在。接下来,我们将各种OOD一般化算法与由两个分布变化主导的新基准进行比较。通过广泛的实验,我们证明现有的OOD算法在一次分配变化中超越了经验风险最小化,通常会限制其他分布变化。新的基准可以作为强有力的立足点,而未来OOD一般化研究可以采用。