Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments. For instance, models that associate cats with bed backgrounds can fail to predict the existence of cats in other environments without beds. Mitigating spurious correlations is crucial in building trustworthy models. However, the existing works lack transparency to offer insights into the mitigation process. In this work, we propose an interpretable framework, Discover and Cure (DISC), to tackle the issue. With human-interpretable concepts, DISC iteratively 1) discovers unstable concepts across different environments as spurious attributes, then 2) intervenes on the training data using the discovered concepts to reduce spurious correlation. Across systematic experiments, DISC provides superior generalization ability and interpretability than the existing approaches. Specifically, it outperforms the state-of-the-art methods on an object recognition task and a skin-lesion classification task by 7.5% and 9.6%, respectively. Additionally, we offer theoretical analysis and guarantees to understand the benefits of models trained by DISC. Code and data are available at https://github.com/Wuyxin/DISC.
翻译:深度神经网络常常依赖于伪相关性来进行预测,这种现象阻碍了模型从训练环境向其他环境进行泛化。例如,将猫与床背景相联系的模型可能无法在没有床的其他环境中预测到猫的存在。减少伪相关性对于构建可信模型至关重要。然而,现有方法缺乏透明性,难以提供消除伪相关性的洞见。本文提出了一个可解释的框架“发现和治愈”(DISC) 来解决这个问题。利用可解释的人类概念,DISC 迭代地:1)找出不同环境下的不稳定概念作为伪属性,2)使用发现的概念干预训练数据以减少伪相关性。通过系统实验,DISC 提供了比现有方法更好的泛化能力和可解释性。具体而言,在一个物体识别任务和一个皮肤损伤分类任务中,它的性能分别比现有技术高出了7.5%和9.6%的准确度。此外,我们还提供了理论分析和保证,以了解使用DISC训练的模型的优势。代码和数据在https://github.com/Wuyxin/DISC上可用。