The problem of spurious correlations (SCs) arises when a classifier relies on non-predictive features that happen to be correlated with the labels in the training data. For example, a classifier may misclassify dog breeds based on the background of dog images. This happens when the backgrounds are correlated with other breeds in the training data, leading to misclassifications during test time. Previous SC benchmark datasets suffer from varying issues, e.g., over-saturation or only containing one-to-one (O2O) SCs, but no many-to-many (M2M) SCs arising between groups of spurious attributes and classes. In this paper, we present Spawrious-{O2O, M2M}-{Easy, Medium, Hard}, an image classification benchmark suite containing spurious correlations among different dog breeds and background locations. To create this dataset, we employ a text-to-image model to generate photo-realistic images, and an image captioning model to filter out unsuitable ones. The resulting dataset is of high quality, containing approximately 152,000 images. Our experimental results demonstrate that state-of-the-art group robustness methods struggle with Spawrious, most notably on the Hard-splits with $<60\%$ accuracy. By examining model misclassifications, we detect reliances on spurious backgrounds, demonstrating that our dataset provides a significant challenge to drive future research.
翻译:假相关( SCs ) 问题出现于分类器依赖非预测性特征, 而这些特征恰好与培训数据中的标签相关。 例如, 分类器可能会错误地根据狗图像的背景对狗品种进行分类。 当背景与培训数据中的其他品种相关, 导致测试时间错误的分类时, 会出现虚假的关联问题 。 以往的 SC 基准数据集存在各种问题, 例如, 过度饱和或仅包含一对一( O2O) SC, 但没有出现与培训数据中标签相关的多至多的( M2M) SC 。 例如, 分类器可能会根据狗图像的图像背景对狗品种进行分类 。 在本文中, 我们展示了 spawright- {O2O, M2M}- { {easy, legend; 包含不同狗品种和背景地点之间可疑的关联的图像的图像分类。 为了创建这个数据集, 我们使用一个文本到图像模型模型模型模型来生成不适当的图像。 由此产生的数据审查是高品质的, Exprudal- existifidistrational- imation imation imational imation imation imation immation 。</s>