For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, satellite imagery classification in FMoW-WILDS, and wildlife classification in iWildCam-WILDS. The strong correlations hold across model architectures, hyperparameters, training set size, and training duration, and are more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.
翻译:为使机器学习系统可靠,我们必须了解它们在看不见的、分配外环境中的性能。在本文中,我们从经验上表明,分配外性能与广泛的模型和分布变化的分布性性能密切相关。具体地说,我们显示了在CIFAR-10 & 图像网络变体上分布性能和分配外性能之间的密切关联。根据YCB物体、FWW-WILDS中的卫星图像分类和iWildCam-WILDS中的野生生物分类得出的合成构成估计任务。强的关联性存在于模型结构、超参数、培训设定的大小和培训期限之间,比现有领域适应理论的预期更为精确。为了完成这一对比性,我们还要调查一些关系较弱的案例,例如,CIFAR-10-C和Camilyon17-WILDS组织分类数据集的一些合成性分布性变化。最后,我们根据高斯数据模型提供了一种候选理论,该模型显示分配变化的数据变化如何影响观察到的相关性。