Trustworthy machine learning is of primary importance to the practical deployment of deep learning models. While state-of-the-art models achieve astonishingly good performance in terms of accuracy, recent literature reveals that their predictive confidence scores unfortunately cannot be trusted: e.g., they are often overconfident when wrong predictions are made, or so even for obvious outliers. In this paper, we introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model, thereby improving its trustworthiness. We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner. Extensive experiments on three trustworthiness-related tasks (misclassification detection, calibration and out-of-distribution detection) across various benchmarks verify the effectiveness of our proposed probing framework.
翻译:值得信赖的机器学习对于实际部署深层学习模式至关重要。尽管最先进的模型在准确性方面取得了惊人的良好业绩,但最近的文献显示,其预测性信心得分不幸无法令人信任:例如,当作出错误预测时,它们往往过于自信,甚至对明显的外派人也是如此。在本文中,我们引入了一种自我监督测试的新方法,这使我们能够检查和减轻受过训练的模型的过度自信问题,从而提高其可信度。我们提供了一个简单而有效的框架,可以灵活地应用到现有的与可信赖性有关的方法中。关于三项与信任性有关的任务(误分类检测、校准和分流检测)的各种基准的广泛实验可以验证我们提议的概率框架的有效性。