Reliable application of machine learning is of primary importance to the practical deployment of deep learning methods. A fundamental challenge is that models are often unreliable due to overconfidence. In this paper, we estimate a model's reliability by measuring \emph{the agreement between its latent space, and the latent space of a foundation model}. However, it is challenging to measure the agreement between two different latent spaces due to their incoherence, \eg, arbitrary rotations and different dimensionality. To overcome this incoherence issue, we design a \emph{neighborhood agreement measure} between latent spaces and find that this agreement is surprisingly well-correlated with the reliability of a model's predictions. Further, we show that fusing neighborhood agreement into a model's predictive confidence in a post-hoc way significantly improves its reliability. Theoretical analysis and extensive experiments on failure detection across various datasets verify the effectiveness of our method on both in-distribution and out-of-distribution settings.
翻译:Translated Abstract:
机器学习的可靠应用对于深度学习方法的实际部署至关重要。一个基本的挑战是模型经常因为过度自信而不可靠。在本文中,我们通过测量模型的潜在空间与 基础模型 的潜在空间之间的协议来估计模型的可靠性。然而,由于这些潜在空间的不连贯性,如任意旋转和不同的维度,所以测量两个不同潜在空间之间的协议是具有挑战性的。为了克服这种不连贯性,我们设计了一种潜在空间间邻域协议(neighborhood agreement measure)方法,并发现潜在空间之间的这种协议与模型预测的可靠性有惊人的相关性。此外,我们证明将邻域协议融入模型预测的置信度中,可以极大地提高模型的可靠性。理论分析和在各种数据集上的故障检测的大量实验证明了我们方法在分布内和分布外情况下的有效性。