Uncertainty quantification is essential for the reliable deployment of machine learning models to high-stakes application domains. Uncertainty quantification is all the more challenging when training distribution and test distribution are different, even the distribution shifts are mild. Despite the ubiquity of distribution shifts in real-world applications, existing uncertainty quantification approaches mainly study the in-distribution setting where the train and test distributions are the same. In this paper, we develop a systematic calibration model to handle distribution shifts by leveraging data from multiple domains. Our proposed method -- multi-domain temperature scaling -- uses the heterogeneity in the domains to improve calibration robustness under distribution shift. Through experiments on three benchmark data sets, we find our proposed method outperforms existing methods as measured on both in-distribution and out-of-distribution test sets.
翻译:不确定性量化对于可靠地将机器学习模型部署到高吸收应用领域至关重要。当培训分布和测试分布不同时,不确定性量化就更加具有挑战性,即使分布变化是温和的。尽管真实世界应用的分布变化无处不在,但现有的不确定性量化方法主要研究列车和测试分布相同的分布环境。在本文件中,我们开发了一个系统校准模型,通过利用多个领域的数据来处理分布转移。我们建议的方法 -- -- 多度温度缩放 -- -- 使用域内的异质性来提高分布转移下的校准强度。通过对三个基准数据集的实验,我们发现我们拟议的方法比在分布和分配外测试组上衡量的现有方法要强。