A key reason for the lack of reliability of deep neural networks in the real world is their heavy reliance on {\it spurious} input features that are causally unrelated to the true label. Focusing on image classifications, we define causal attributes as the set of visual features that are always a part of the object while spurious attributes are the ones that are likely to {\it co-occur} with the object but not a part of it (e.g., attribute ``fingers" for class ``band aid"). Traditional methods for discovering spurious features either require extensive human annotations (thus, not scalable), or are useful on specific models. In this work, we introduce a {\it scalable} framework to discover a subset of spurious and causal visual attributes used in inferences of a general model and localize them on a large number of images with minimal human supervision. Our methodology is based on this key idea: to identify spurious or causal \textit{visual attributes} used in model predictions, we identify spurious or causal \textit{neural features} (penultimate layer neurons of a robust model) via limited human supervision (e.g., using top 5 activating images per feature). We then show that these neural feature annotations {\it generalize} extremely well to many more images {\it without} any human supervision. We use the activation maps for these neural features as the soft masks to highlight spurious or causal visual attributes. Using this methodology, we introduce the {\it Causal Imagenet} dataset containing causal and spurious masks for a large set of samples from Imagenet. We assess the performance of several popular Imagenet models and show that they rely heavily on various spurious features in their predictions.
翻译:真实世界中深层神经网络缺乏可靠性的一个关键原因是它们严重依赖与真实标签无关的虚假输入特征。 聚焦于图像分类, 我们定义因果属性是一组视觉特征, 这些特征始终是目标的一部分, 而虚假属性则是那些有可能与目标一同存在( 例如, 属性“ 标签协助 ” ), 而不是它的一部分。 发现虚假特征的传统方法要么需要广泛的人类说明( 而非可缩放 ), 要么对特定模型有用 。 在此工作中, 我们引入了一个可缩放的图像框架, 以发现一组在一般模型的推论中使用的虚假和有因果关系的视觉特征, 并且将其定位在大量图像上( 例如, 属性“ 标签援助 ” ) 。 我们的方法基于这个关键理念: 识别在类“ 类“ 标签援助” 中的虚假或因果 亮度 ; 我们从模型中识别虚度或因果的 直线属性, 或者对特定模型有用 。 ( 直观的图像的直径 ) ( 直观) 使用这些直径直观图像的直径直观的直观的图像,, 显示这些直观的直径直径直径直径直径直径的图像的图像 。 ( 我们用这些直观的直观的直观的直观的图像的直观的直观的直观的直径直观),, 显示的直径直径直径直径直观的图像 。 。