We introduce two challenging datasets that reliably cause machine learning model performance to substantially degrade. The datasets are collected with a simple adversarial filtration technique to create datasets with limited spurious cues. Our datasets' real-world, unmodified examples transfer to various unseen models reliably, demonstrating that computer vision models have shared weaknesses. The first dataset is called ImageNet-A and is like the ImageNet test set, but it is far more challenging for existing models. We also curate an adversarial out-of-distribution detection dataset called ImageNet-O, which is the first out-of-distribution detection dataset created for ImageNet models. On ImageNet-A a DenseNet-121 obtains around 2% accuracy, an accuracy drop of approximately 90%, and its out-of-distribution detection performance on ImageNet-O is near random chance levels. We find that existing data augmentation techniques hardly boost performance, and using other public training datasets provides improvements that are limited. However, we find that improvements to computer vision architectures provide a promising path towards robust models.
翻译:我们引入了两个具有挑战性的数据集, 从而可靠地导致机器学习模型性能大幅下降。 数据集是用简单的对抗式过滤技术来收集的, 以有限的虚假提示创建数据集。 我们的数据集的真实世界, 未修改的例子转移到各种不可见的模型, 表明计算机视觉模型有共同的弱点。 第一个数据集被称为图像网- A, 类似于图像网测试集, 但对现有模型来说, 却更具挑战性得多 。 我们还设计了一个称为图像网- O 的对称分布检测数据集的对称, 即图像网- O, 这是为图像网模型创建的首个发布外检测数据集。 在图像网- A DenseNet-121 上, 获得了大约2%的准确率, 大约90%的准确率下降, 其图像网- O 上的发布检测性能几乎是随机的。 我们发现, 现有的数据增强技术很难提升性能, 使用其他公共培训数据集则提供有限的改进。 然而, 我们发现, 计算机视觉结构的改进提供了一条通向强型模型的有希望的道路 。