Evaluating the performance of machine learning models under distribution shift is challenging, especially when we only have unlabeled data from the shifted (target) domain, along with labeled data from the original (source) domain. Recent work suggests that the notion of disagreement, the degree to which two models trained with different randomness differ on the same input, is a key to tackle this problem. Experimentally, disagreement and prediction error have been shown to be strongly connected, which has been used to estimate model performance. Experiments have lead to the discovery of the disagreement-on-the-line phenomenon, whereby the classification error under the target domain is often a linear function of the classification error under the source domain; and whenever this property holds, disagreement under the source and target domain follow the same linear relation. In this work, we develop a theoretical foundation for analyzing disagreement in high-dimensional random features regression; and study under what conditions the disagreement-on-the-line phenomenon occurs in our setting. Experiments on CIFAR-10-C, Tiny ImageNet-C, and Camelyon17 are consistent with our theory and support the universality of the theoretical findings.
翻译:评估分布式转换中的机器学习模型的性能具有挑战性,特别是当我们仅从转移(目标)领域获得未贴标签的数据,以及原始(源)领域贴标签的数据时。最近的工作表明,分歧的概念,即两个经过不同随机性培训的模型在不同输入方面不同的程度,是解决这一问题的关键。实验性、分歧和预测错误已证明是紧密相连的,用来估计模型性能。实验导致发现线上差异现象,目标领域下的分类错误往往是源领域分类错误的线性函数;每当这种属性持有时,源和目标领域下的分歧都遵循同样的线性关系。在这项工作中,我们为分析高维随机特征回归中的分歧发展了一个理论基础;在什么条件下我们环境中会出现的线上分歧现象进行研究。CIRA-10-C、Tiniy图像网-C和Camelyon17的实验符合我们的理论,并支持理论发现的普遍性。