We introduce four new real-world distribution shift datasets consisting of changes in image style, image blurriness, geographic location, camera operation, and more. With our new datasets, we take stock of previously proposed methods for improving out-of-distribution robustness and put them to the test. We find that using larger models and artificial data augmentations can improve robustness on real-world distribution shifts, contrary to claims in prior work. We find improvements in artificial robustness benchmarks can transfer to real-world distribution shifts, contrary to claims in prior work. Motivated by our observation that data augmentations can help with real-world distribution shifts, we also introduce a new data augmentation method which advances the state-of-the-art and outperforms models pretrained with 1000 times more labeled data. Overall we find that some methods consistently help with distribution shifts in texture and local image statistics, but these methods do not help with some other distribution shifts like geographic changes. Our results show that future research must study multiple distribution shifts simultaneously, as we demonstrate that no evaluated method consistently improves robustness.
翻译:我们引入了四个新的真实世界分布变化数据集, 包括图像样式、 图像模糊度、 地理位置、 相机操作等的变化。 我们利用新的数据集, 评估了先前提出的改善分配外稳健性的方法, 并将其置于测试之中。 我们发现, 使用更大的模型和人工数据增强可以提高真实世界分布变化的稳健性, 与先前工作中的主张相反。 我们发现, 人造稳健性基准的改进可以转移到真实世界分布变化, 与先前工作中的主张相反。 我们的观察发现, 数据增强有助于真实世界分布变化, 我们还引入了一种新的数据增强方法, 推进最先进的、 最先进的、 超完美模型, 事先用1000倍的标签数据进行预导。 总体而言, 我们发现, 某些方法可以持续地帮助文本和本地图像统计的分布变化, 但是这些方法不能帮助其它的分布变化, 比如地理变化。 我们的结果表明, 未来研究必须同时研究多重分布变化, 因为我们证明没有经过评估的方法持续改善稳健性 。