Out-of-domain (OOD) generalization is a significant challenge for machine learning models. Many techniques have been proposed to overcome this challenge, often focused on learning models with certain invariance properties. In this work, we draw a link between OOD performance and model calibration, arguing that calibration across multiple domains can be viewed as a special case of an invariant representation leading to better OOD generalization. Specifically, we show that under certain conditions, models which achieve \emph{multi-domain calibration} are provably free of spurious correlations. This leads us to propose multi-domain calibration as a measurable and trainable surrogate for the OOD performance of a classifier. We therefore introduce methods that are easy to apply and allow practitioners to improve multi-domain calibration by training or modifying an existing model, leading to better performance on unseen domains. Using five datasets from the recently proposed WILDS OOD benchmark, as well as the Colored MNIST dataset, we demonstrate that training or tuning models so they are calibrated across multiple domains leads to significantly improved performance on unseen test domains. We believe this intriguing connection between calibration and OOD generalization is promising from both a practical and theoretical point of view.
翻译:对机器学习模型来说,外域( OOD) 常规化是一个巨大的挑战。 许多技术已经提出来克服这一挑战, 往往侧重于具有某些差异特性的学习模型。 在这项工作中, 我们将OOD性能和模型校准联系起来, 认为对多个域的校准可以被视为一个特殊的例子, 一种无差异的表达方式, 导致改善 OOD 的概括化。 具体地说, 我们表明, 在某些条件下, 实现 emph{ 多重- 多域校准的模型是没有虚假的关联的。 这导致我们提出多域校准, 作为可测量师 OOD 性能的可测量和可训练的替代工具。 因此, 我们引入了易于应用的方法, 并允许从业人员通过培训或修改现有模型改进多域校准, 从而改进 OOO 的多域校准。 我们用最近提议的WILDS OOD基准的五个数据集, 以及有色的MNIST数据集, 我们证明这些模型在多个域校准中经过校准, 和有希望的轨道校准, 。