已知基于图像的分类器在稀有子集上的系统错误 (Identification of Systematic Errors of Image Classifiers on Rare Subgroups)

Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups with subpar performance when the subgroups are not annotated and their occurrence is very rare. We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance on the prompt-conditioned synthesized data. To tackle the exponentially growing number of subgroups, we employ combinatorial testing. We denote this procedure as PromptAttack as it can be interpreted as an adversarial attack in a prompt space. We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy. Thereupon, we apply PromptAttack to ImageNet classifiers and identify novel systematic errors on rare subgroups.

翻译：尽管许多基于图像的分类器的平均性能非常出色，但它们的性能可能会在训练数据中未充分代表语义相关子集上严重恶化。这些系统性错误可能会影响民族少数群体的公平性以及在域转移下的稳健性和安全性。一个重大挑战是在这些未注释的、且发生的概率非常罕见的子集中，识别那些性能低下的子集。我们利用了最近文本到图像模型的进展，在文本描述区域子集（“提示”）的空间中搜索，在条件合成数据上，分类器在提示条件下表现不佳的子集。为了处理指数级增长的子集数量，我们采用组合测试。我们将这个过程称为PromptAttack（因为它可以被解释为在提示空间中的对抗攻击）。我们在受控环境中研究了PromptAttack的子集覆盖和可识别性，并发现它可以高精度地识别系统性错误。然后，我们将PromptAttack应用于ImageNet分类器，并在稀有子集上识别出新的系统性错误。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日