Neural networks are prone to learning shortcuts -- they often model simple correlations, ignoring more complex ones that potentially generalize better. Prior works on image classification show that instead of learning a connection to object shape, deep classifiers tend to exploit spurious correlations with low-level texture or the background for solving the classification task. In this work, we take a step towards more robust and interpretable classifiers that explicitly expose the task's causal structure. Building on current advances in deep generative modeling, we propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision. By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background; hence, they allow for generating counterfactual images. We demonstrate the ability of our model to generate such images on MNIST and ImageNet. Further, we show that the counterfactual images can improve out-of-distribution robustness with a marginal drop in performance on the original classification task, despite being synthetic. Lastly, our generative model can be trained efficiently on a single GPU, exploiting common pre-trained models as inductive biases.
翻译:神经网络容易学习捷径 -- -- 它们往往建模简单的关联,忽略了可能更全面化的更复杂的关联。先前的图像分类工作显示,与其学习与物体形状的连接,深层分类者往往利用与低层纹理或背景的虚假关联,以完成分类任务。在这项工作中,我们向更强大和可解释的分类器迈出一步,明确暴露任务因果结构。在深层基因化模型的当前进展的基础上,我们提议将图像生成过程分解为独立因果机制,而无需直接监督。通过利用适当的感知偏差,这些机制分解对象形状、对象纹理和背景;因此,它们允许生成反事实图像。我们展示了我们的模型在MNISTS和图像网络上生成此类图像的能力。此外,我们展示了反事实图像可以改善分配的稳健性,在原始分类任务上表现的边缘下降,尽管是合成的。最后,我们的基因化模型可以在单一的GPU上高效地培训,利用共同的受训练模型作为感动的偏差。