A significant obstacle in the development of robust machine learning models is covariate shift, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain the same. Despite the prevalence of covariate shift in real-world applications, a theoretical understanding in the context of modern machine learning has remained lacking. In this work, we examine the exact high-dimensional asymptotics of random feature regression under covariate shift and present a precise characterization of the limiting test error, bias, and variance in this setting. Our results motivate a natural partial order over covariate shifts that provides a sufficient condition for determining when the shift will harm (or even help) test performance. We find that overparameterized models exhibit enhanced robustness to covariate shift, providing one of the first theoretical explanations for this intriguing phenomenon. Additionally, our analysis reveals an exact linear relationship between in-distribution and out-of-distribution generalization performance, offering an explanation for this surprising recent empirical observation.
翻译:在开发强大的机器学习模型方面,一个重大障碍是共变式,这是一种分配式转变的形式,当培训和测试组的投入分布不同时,就会发生一种分配式转变,而有条件的标签分布则保持不变。尽管现实世界应用中普遍存在共变式转变,但在现代机器学习方面仍然缺乏理论理解。在这项工作中,我们研究了在共变式转变下随机特征回归的精确高维杂念,并对这一环境的有限测试错误、偏差和差异作了精确的描述。我们的结果促使对共变式变化进行自然的局部排序,为确定这种转变何时会损害(甚至帮助)测试性能提供了充分的条件。我们发现,过度的参数模型显示了共变的强大性,为这种引人入胜的现象提供了初步的理论解释之一。此外,我们的分析揭示了分配和分配外概括性表现之间的精确线性关系,为最近令人惊讶的经验性观察提供了解释。