The mismatch between training and target data is one major challenge for current machine learning systems. When training data is collected from multiple domains and the target domains include all training domains and other new domains, we are facing an Out-of-Distribution (OOD) generalization problem that aims to find a model with the best OOD accuracy. One of the definitions of OOD accuracy is worst-domain accuracy. In general, the set of target domains is unknown, and the worst over target domains may be unseen when the number of observed domains is limited. In this paper, we show that the worst accuracy over the observed domains may dramatically fail to identify the OOD accuracy. To this end, we introduce Influence Function, a classical tool from robust statistics, into the OOD generalization problem and suggest the variance of influence function to monitor the stability of a model on training domains. We show that the accuracy on test domains and the proposed index together can help us discern whether OOD algorithms are needed and whether a model achieves good OOD generalization.
翻译:培训与目标数据不匹配是当前机器学习系统的一大挑战。 当培训数据从多个域收集,目标域包括所有培训域和其他新域时,我们正面临一个外分布(OOOD)一般化问题,目的是找到一个具有最佳 OOD 精确度的模型。 OOD 准确性的定义之一是最差域的准确性。 一般来说, 一组目标域并不为人知, 当观测域数有限时, 最差的目标域可能是看不见的。 在本文中, 我们显示, 观察到的域中最差的精度可能严重地无法识别 OOOD 准确性。 为此, 我们引入了影响力函数, 这是一种来自可靠统计数据的经典工具, 进入 OOOD 通用问题, 并提出影响功能的差异, 以监测培训域模型的稳定性。 我们显示, 测试域的精度和拟议索引一起可以帮助我们辨别是否需要 OOD 算法, 以及模型是否实现良好的 OOD 常规化 。