各地净化特征 -- -- 图像网络中有害净化特征的大规模探测 (Spurious Features Everywhere -- Large-Scale Detection of Harmful Spurious Features in ImageNet)

Benchmark performance of deep learning classifiers alone is not a reliable predictor for the performance of a deployed model. In particular, if the image classifier has picked up spurious features in the training data, its predictions can fail in unexpected ways. In this paper, we develop a framework that allows us to systematically identify spurious features in large datasets like ImageNet. It is based on our neural PCA components and their visualization. Previous work on spurious features of image classifiers often operates in toy settings or requires costly pixel-wise annotations. In contrast, we validate our results by checking that presence of the harmful spurious feature of a class is sufficient to trigger the prediction of that class. We introduce a novel dataset "Spurious ImageNet" and check how much existing classifiers rely on spurious features.

翻译：仅凭深层学习分类器的基准性能本身并不能可靠地预测被部署模型的性能。特别是, 如果图像分类器在培训数据中发现了虚假的特征, 其预测可能会以出乎意料的方式失败。在本文中, 我们开发了一个框架, 使我们能够系统地识别像图像网络这样的大型数据集中的虚假特征。它基于我们的神经多功能元件及其可视化。先前关于图像分类器的虚假特征的工作经常在玩具环境中运作, 或者需要昂贵的像素说明。相反, 我们通过检查某一类中存在有害的虚假特征足以引发该类的预测来验证我们的结果。我们引入了一个新的数据集“ 纯化图像网络 ”, 并检查现有分类器多多少依赖虚假特征。

相关内容

ImageNet (数据集)

关注 21

ImageNet项目是一个用于视觉对象识别软件研究的大型可视化数据库。超过1400万的图像URL被ImageNet手动注释，以指示图片中的对象;在至少一百万个图像中，还提供了边界框。ImageNet包含2万多个类别; [2]一个典型的类别，如“气球”或“草莓”，包含数百个图像。第三方图像URL的注释数据库可以直接从ImageNet免费获得;但是，实际的图像不属于ImageNet。自2010年以来，ImageNet项目每年举办一次软件比赛，即ImageNet大规模视觉识别挑战赛（ILSVRC），软件程序竞相正确分类检测物体和场景。 ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表。2012年在解决ImageNet挑战方面取得了巨大的突破，被广泛认为是2010年的深度学习革命的开始。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日