Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups with subpar performance when the subgroups are not annotated and their occurrence is very rare. We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance on the prompt-conditioned synthesized data. To tackle the exponentially growing number of subgroups, we employ combinatorial testing. We denote this procedure as PromptAttack as it can be interpreted as an adversarial attack in a prompt space. We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy. Thereupon, we apply PromptAttack to ImageNet classifiers and identify novel systematic errors on rare subgroups.
翻译:尽管许多图像分类者表现优异,但是他们的表现在培训数据中代表性不足的数据的精密连贯分组上可能大大恶化。这些系统性错误既影响人口少数群体的公平性,又影响域变的稳健性和安全性。一个重大挑战是,当分组没有附加说明,而且其发生非常罕见时,确定具有亚相色分色的分组。我们利用文本到图像模型的最新进展,并搜索分组文字描述空间(“提示”),这些分组的目标模型在迅速设定的综合数据上表现较差。为了应对急剧增长的分组,我们采用组合测试。我们将此程序描述为“提示”程序,因为它可以被解释为在快速空间进行对抗性攻击。我们在控制环境下与TerAtack研究分组的覆盖面和识别性,发现它能以高度准确的方式识别系统错误。因此,我们对图像网络分类员应用了“提示”软件,并识别稀有分组的新的系统错误。</s>