Spurious correlations in training data often lead to robustness issues since models learn to use them as shortcuts. For example, when predicting whether an object is a cow, a model might learn to rely on its green background, so it would do poorly on a cow on a sandy background. A standard dataset for measuring state-of-the-art on methods mitigating this problem is Waterbirds. The best method (Group Distributionally Robust Optimization - GroupDRO) currently achieves 89\% worst group accuracy and standard training from scratch on raw images only gets 72\%. GroupDRO requires training a model in an end-to-end manner with subgroup labels. In this paper, we show that we can achieve up to 90\% accuracy without using any sub-group information in the training set by simply using embeddings from a large pre-trained vision model extractor and training a linear classifier on top of it. With experiments on a wide range of pre-trained models and pre-training datasets, we show that the capacity of the pre-training model and the size of the pre-training dataset matters. Our experiments reveal that high capacity vision transformers perform better compared to high capacity convolutional neural networks, and larger pre-training dataset leads to better worst-group accuracy on the spurious correlation dataset.
翻译:培训数据中纯净的关联性往往导致稳健性问题,因为模型学会了用它们作为捷径。例如,在预测一个对象是否是母牛时,模型可能会学会依赖其绿色背景,因此在沙地背景上对母牛来说是差的。测量减轻这一问题的方法方面最新技术的标准数据集是水鸟。最佳方法(Group discriminally Robust Opropimization-GroupDRO)目前达到89 ⁇ 最差的组级精确度和从原始图像开始的标准训练只达到72 ⁇ 。GroupDRO需要用分组标签对一个模型进行端到端的培训。在本文中,我们显示我们可以达到90 ⁇ 的准确度,而不用在培训中使用任何分组信息,只需使用事先经过培训的大型模型提取器的嵌入,并在顶端培训一个线性分类器。在一系列预先培训的模型和训练前数据集上进行实验,我们展示了培训前模型的能力和训练前神经数据集的大小。我们的实验显示,比进化前的更精确性更强的成型数据。我们实验显示,将高的导能力转换为更高的数据。