Covariate shift generalization, a typical case in out-of-distribution (OOD) generalization, requires a good performance on the unknown testing distribution, which varies from the accessible training distribution in the form of covariate shift. Recently, stable learning algorithms have shown empirical effectiveness to deal with covariate shift generalization on several learning models involving regression algorithms and deep neural networks. However, the theoretical explanations for such effectiveness are still missing. In this paper, we take a step further towards the theoretical analysis of stable learning algorithms by explaining them as feature selection processes. We first specify a set of variables, named minimal stable variable set, that is minimal and optimal to deal with covariate shift generalization for common loss functions, including the mean squared loss and binary cross entropy loss. Then we prove that under ideal conditions, stable learning algorithms could identify the variables in this set. Further analysis on asymptotic properties and error propagation are also provided. These theories shed light on why stable learning works for covariate shift generalization.
翻译:共变转换一般化是分配外(OOD)通用化的一个典型案例,它要求在未知的测试分布上表现良好,这不同于以共变式转换的形式提供的无障碍培训分布。最近,稳定的学习算法显示出处理若干学习模型共变转移一般化的经验效果,这些模型包括回归算法和深神经网络。然而,关于这种效果的理论解释仍然缺乏。在本文件中,我们通过将稳定学习算法解释为特征选择过程,朝着对稳定学习算法进行理论分析的方向迈出了一步。我们首先指定了一套变量,称为最低稳定变量集,这是处理共同损失函数共变换一般化的最起码和最佳办法,包括平均平方损失和二进式跨进制损失。然后我们证明,在理想条件下,稳定的学习算法可以确定这一集中的变量。还进一步分析了这些参数的属性和错误传播。这些理论说明了为什么稳定学习工作有利于共变式转移一般化。