Large-scale pre-trained models have achieved remarkable success in a variety of scenarios and applications, but how to leverage them to improve the prediction reliability of downstream models is undesirably under-explored. Moreover, modern neural networks have been found to be poorly calibrated and make overconfident predictions regardless of inherent sample difficulty and data uncertainty. To address this issue, we propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. Pre-trained models that have been exposed to large-scale datasets and do not overfit the downstream training classes enable us to measure each training sample difficulty via feature-space Gaussian modeling and relative Mahalanobis distance computation. Importantly, by adaptively penalizing overconfident prediction based on the sample's difficulty, we simultaneously improve accuracy and uncertainty calibration on various challenging benchmarks, consistently surpassing competitive baselines for reliable prediction.
翻译:大规模的预训练模型在各种场景和应用中取得了显著的成功,但如何利用它们来提高下游模型的预测时间可靠性尚未得到充分的探究。此外,现代神经网络的校准性较差,无论固有的样本难度和数据不确定性如何,都会进行过度自信的预测。为了解决这个问题,我们提出利用大规模预先训练的模型以指导样本难度感知熵正则化的下游模型训练。通过暴露给大规模数据集并且不在下游训练类中过度拟合的预训练模型,我们能够通过特征空间高斯建模和相对马氏距离计算来测量每个训练样本的困难程度。重要的是,通过基于样本难度的过度自信预测自适应惩罚,我们同时提高了各种具有挑战性的基准测试的准确性和不确定性校准,始终超过了可靠预测的竞争基线。