Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups with subpar performance when the subgroups are not annotated and their occurrence is very rare. We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance on the prompt-conditioned synthesized data. To tackle the exponentially growing number of subgroups, we employ combinatorial testing. We denote this procedure as PromptAttack as it can be interpreted as an adversarial attack in a prompt space. We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy. Thereupon, we apply PromptAttack to ImageNet classifiers and identify novel systematic errors on rare subgroups.
翻译:尽管许多基于图像的分类器的平均性能非常出色,但它们的性能可能会在训练数据中未充分代表语义相关子集上严重恶化。这些系统性错误可能会影响民族少数群体的公平性以及在域转移下的稳健性和安全性。一个重大挑战是在这些未注释的、且发生的概率非常罕见的子集中,识别那些性能低下的子集。我们利用了最近文本到图像模型的进展,在文本描述区域子集(“提示”)的空间中搜索,在条件合成数据上,分类器在提示条件下表现不佳的子集。为了处理指数级增长的子集数量,我们采用组合测试。我们将这个过程称为PromptAttack(因为它可以被解释为在提示空间中的对抗攻击)。我们在受控环境中研究了PromptAttack的子集覆盖和可识别性,并发现它可以高精度地识别系统性错误。然后,我们将PromptAttack应用于ImageNet分类器,并在稀有子集上识别出新的系统性错误。