Deep learning models remain vulnerable to spurious correlations, leading to so-called Clever Hans predictors that undermine robustness even in large-scale foundation and self-supervised models. Group distributional robustness methods, such as Deep Feature Reweighting (DFR) rely on explicit group labels to upweight underrepresented subgroups, but face key limitations: (1) group labels are often unavailable, (2) low within-group sample sizes hinder coverage of the subgroup distribution, and (3) performance degrades sharply when multiple spurious correlations fragment the data into even smaller groups. We propose Counterfactual Knowledge Distillation (CFKD), a framework that sidesteps these issues by generating diverse counterfactuals, enabling a human annotator to efficiently explore and correct the model's decision boundaries through a knowledge distillation step. Unlike DFR, our method not only reweights the undersampled groups, but it also enriches them with new data points. Our method does not require any confounder labels, achieves effective scaling to multiple confounders, and yields balanced generalization across groups. We demonstrate CFKD's efficacy across five datasets, spanning synthetic tasks to an industrial application, with particularly strong gains in low-data regimes with pronounced spurious correlations. Additionally, we provide an ablation study on the effect of the chosen counterfactual explainer and teacher model, highlighting their impact on robustness.
翻译:深度学习模型仍然容易受到虚假相关性的影响,导致所谓的"聪明汉斯"预测器,即使在大型基础模型和自监督模型中也会削弱鲁棒性。群体分布鲁棒性方法(如深度特征重加权)依赖显式的群体标签来提升欠表示子组的权重,但面临关键限制:(1) 群体标签通常不可得,(2) 组内样本量过小阻碍子组分布的覆盖,(3) 当多个虚假相关性将数据分割成更小的群体时性能急剧下降。我们提出反事实知识蒸馏框架,通过生成多样化反例规避这些问题,使人类标注者能够通过知识蒸馏步骤有效探索并修正模型的决策边界。与深度特征重加权不同,我们的方法不仅重加权欠采样群体,还通过新数据点对其进行增强。本方法无需任何混杂因子标签,能有效扩展到多个混杂因子,并实现跨群体的平衡泛化。我们在五个数据集上验证了反事实知识蒸馏的有效性,涵盖合成任务到工业应用场景,在具有显著虚假相关性的低数据区域表现尤为突出。此外,我们通过消融实验分析了所选反事实解释器和教师模型的影响,揭示了它们对鲁棒性的作用机制。