Machine learning systems are often applied to data that is drawn from a different distribution than the training distribution. Recent work has shown that for a variety of classification and signal reconstruction problems, the out-of-distribution performance is strongly linearly correlated with the in-distribution performance. If this relationship or more generally a monotonic one holds, it has important consequences. For example, it allows to optimize performance on one distribution as a proxy for performance on the other. In this paper, we study conditions under which a monotonic relationship between the performances of a model on two distributions is expected. We prove an exact asymptotic linear relation for squared error and a monotonic relation for misclassification error for ridge-regularized general linear models under covariate shift, as well as an approximate linear relation for linear inverse problems.
翻译:近来的工作表明,对于各种分类和信号重建问题,分配外性能与分配内性能有着强烈的线性关系。如果这种关系或更一般而言是单向性,则会产生重要后果。例如,它允许在一种分配上优化性能,作为另一种分配的代理性能。在本文中,我们研究两种分配模式的性能之间预期会出现单一关系的条件。我们证明,正方差差的线性关系是精确的,差分性差误差的单向性关系是二次变换中脊固定的一般线性模型的单向性关系,以及线性反问题的近线性关系。