This work aims to assess how well a model performs under distribution shifts without using labels. While recent methods study prediction confidence, this work reports prediction dispersity is another informative cue. Confidence reflects whether the individual prediction is certain; dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a well-performing model should give predictions with high confidence and high dispersity. That is, we need to consider both properties so as to make more accurate estimates. To this end, we use the nuclear norm that has been shown to be effective in characterizing both properties. Extensive experiments validate the effectiveness of nuclear norm for various models (e.g., ViT and ConvNeXt), different datasets (e.g., ImageNet and CUB-200), and diverse types of distribution shifts (e.g., style shift and reproduction shift). We show that the nuclear norm is more accurate and robust in accuracy estimation than existing methods. Furthermore, we validate the feasibility of other measurements (e.g., mutual information maximization) for characterizing dispersity and confidence. Lastly, we investigate the limitation of the nuclear norm, study its improved variant under severe class imbalance, and discuss potential directions.
翻译:这项工作旨在评估一个模型在不使用标签的情况下在分配变化中的表现如何。虽然最近的方法研究预测信心,但本工作报告预测分散性是另一个信息提示。信心反映个人预测是否肯定;分散性表明总体预测如何在所有类别中分布。我们的主要见解是,一个表现良好的模型应当以高度信心和高度分散性来作出预测。也就是说,我们需要考虑两种特性,以便作出更准确的估计。为此,我们使用已证明在两种特性定性方面行之有效的核规范。广泛的实验验证了核规范在各种模型(如ViT和ConvNeXt)、不同数据集(如图像网络和CUB-200)和不同种类的分布变化(如风格变化和复制变化)中的有效性。我们表明,核规范比现有方法更准确和稳健。此外,我们验证了其他测量(如相互信息最大化)在区分差异性和信任性方面的可行性。最后,我们调查核规范的限制、在严重类别下改进的变差趋势。