Robustness to distribution shifts is critical for deploying machine learning models in the real world. Despite this necessity, there has been little work in defining the underlying mechanisms that cause these shifts and evaluating the robustness of algorithms across multiple, different distribution shifts. To this end, we introduce a framework that enables fine-grained analysis of various distribution shifts. We provide a holistic analysis of current state-of-the-art methods by evaluating 19 distinct methods grouped into five categories across both synthetic and real-world datasets. Overall, we train more than 85K models. Our experimental framework can be easily extended to include new methods, shifts, and datasets. We find, unlike previous work~\citep{Gulrajani20}, that progress has been made over a standard ERM baseline; in particular, pretraining and augmentations (learned or heuristic) offer large gains in many cases. However, the best methods are not consistent over different datasets and shifts.
翻译:批发转移的强力对于在现实世界中部署机器学习模型至关重要。 尽管如此, 在界定导致这些转变的基本机制以及评估多种不同分布转移的算法的稳健性方面, 仍然没有做多少工作。 为此, 我们引入了一个框架, 能够细微分析各种分布转移。 我们通过对合成和真实世界数据集分为五类的19种不同方法进行评估, 对当前最先进的方法进行了全面分析。 总的来说, 我们培训了85K多个模型。 我们的实验框架可以很容易地扩展, 包括新的方法、 转移和数据集。 我们发现, 与先前的工作不同, “ citep{ Gulrajani20}, 我们发现, 在标准的机构风险管理基线上已经取得了进展; 特别是, 培训前和增强( 学习或超自然) 在许多情况下带来巨大收益。 但是, 最佳方法在不同的数据集和变化上并不一致。