Out-of-domain (OOD) generalization is a significant challenge for machine learning models. Many techniques have been proposed to overcome this challenge, often focused on learning models with certain invariance properties. In this work, we draw a link between OOD performance and model calibration, arguing that calibration across multiple domains can be viewed as a special case of an invariant representation leading to better OOD generalization. Specifically, we show that under certain conditions, models which achieve \emph{multi-domain calibration} are provably free of spurious correlations. This leads us to propose multi-domain calibration as a measurable and trainable surrogate for the OOD performance of a classifier. We therefore introduce methods that are easy to apply and allow practitioners to improve multi-domain calibration by training or modifying an existing model, leading to better performance on unseen domains. Using four datasets from the recently proposed WILDS OOD benchmark, as well as the Colored MNIST dataset, we demonstrate that training or tuning models so they are calibrated across multiple domains leads to significantly improved performance on unseen test domains. We believe this intriguing connection between calibration and OOD generalization is promising from both a practical and theoretical point of view.
翻译:对机器学习模型来说,外域( OOD) 常规化是一个巨大的挑战。 许多技术已经提出来克服这一挑战, 往往侧重于具有某些差异特性的学习模型。 在这项工作中, 我们将OOD性能和模型校准联系起来, 认为对多个域的校准可以被视为一个特殊的例子, 一种无差异的表达方式, 导致改善 OOD 的概括化。 具体地说, 我们证明在某些条件下, 实现 multi- domain 校准的模型是没有虚假的关联的。 这导致我们建议多域校准, 作为一种可测量师性能的可测量和可训练的替代装置。 因此, 我们引入了易于应用的方法, 并允许执业者通过培训或修改现有模型来改进多域校准, 从而改进 OOO 。 我们相信, 利用最近提议的 WILDS OOD 参数的四套数据集, 以及有色的 MNIST 数据集, 我们证明, 这样的培训或调制式模型在多个域间经过校准的校准可大大改进的 OD 。