Machine learning models have been found to learn shortcuts -- unintended decision rules that are unable to generalize -- undermining models' reliability. Previous works address this problem under the tenuous assumption that only a single shortcut exists in the training data. Real-world images are rife with multiple visual cues from background to texture. Key to advancing the reliability of vision systems is understanding whether existing methods can overcome multiple shortcuts or struggle in a Whac-A-Mole game, i.e., where mitigating one shortcut amplifies reliance on others. To address this shortcoming, we propose two benchmarks: 1) UrbanCars, a dataset with precisely controlled spurious cues, and 2) ImageNet-W, an evaluation set based on ImageNet for watermark, a shortcut we discovered affects nearly every modern vision model. Along with texture and background, ImageNet-W allows us to study multiple shortcuts emerging from training on natural images. We find computer vision models, including large foundation models -- regardless of training set, architecture, and supervision -- struggle when multiple shortcuts are present. Even methods explicitly designed to combat shortcuts struggle in a Whac-A-Mole dilemma. To tackle this challenge, we propose Last Layer Ensemble, a simple-yet-effective method to mitigate multiple shortcuts without Whac-A-Mole behavior. Our results surface multi-shortcut mitigation as an overlooked challenge critical to advancing the reliability of vision systems. The datasets and code are released: https://github.com/facebookresearch/Whac-A-Mole.
翻译:机器学习模型可能会学习到缩短路径,即无法概括的意外决策规则,从而损害模型的可靠性。以往研究中,人们假定训练数据仅存在一个缩短路径,以此来解决该问题。然而,现实世界中的图像中充斥着多个视觉线索,包括背景和纹理等。提高视觉系统的可靠性的关键是要确定现有方法是否能够克服多个缩短路径问题,或者会在Whac-A-Mole游戏中陷入困境——即减轻一个缩短路径反而会增强对其他路径的依赖。为了解决这个问题,我们提出了两个基准测试:1)UrbanCars,一个精确控制表观线索的数据集,2)ImageNet-W,基于ImageNet的评估集,专门针对水印——一种我们发现影响几乎所有现代视觉模型的缩短路径。除了纹理和背景外,ImageNet-W还允许我们研究自然图像训练中出现的多个缩短路径。我们发现,计算机视觉模型,包括大型基础模型(无论是训练集、架构还是监督方式),在存在多个缩短路径时都会遇到困难。即使是专门设计用于解决缩短路径问题的方法,也会在Whac-A-Mole困境中陷入困境。为了应对这一挑战,我们提出了一种简单而有效的方法——Last Layer Ensemble,可以缓解多个缩短路径问题而不会引起Whac-A-Mole行为。我们的结果表明,多路径缓解是提高视觉系统可靠性所忽视的关键挑战。数据集和代码已经发布:https://github.com/facebookresearch/Whac-A-Mole。