Although machine learning models typically experience a drop in performance on out-of-distribution data, accuracies on in- versus out-of-distribution data are widely observed to follow a single linear trend when evaluated across a testbed of models. Models that are more accurate on the out-of-distribution data relative to this baseline exhibit "effective robustness" and are exceedingly rare. Identifying such models, and understanding their properties, is key to improving out-of-distribution performance. We conduct a thorough empirical investigation of effective robustness during fine-tuning and surprisingly find that models pre-trained on larger datasets exhibit effective robustness during training that vanishes at convergence. We study how properties of the data influence effective robustness, and we show that it increases with the larger size, more diversity, and higher example difficulty of the dataset. We also find that models that display effective robustness are able to correctly classify 10% of the examples that no other current testbed model gets correct. Finally, we discuss several strategies for scaling effective robustness to the high-accuracy regime to improve the out-of-distribution accuracy of state-of-the-art models.
翻译:虽然机器学习模型在分配外数据方面的性能通常会下降,但人们广泛观察到,在通过测试模型进行评估时,关于分配内数据与分配外数据的准确性会遵循单一线性趋势。相对于此基线而言,对分配外数据更为准确的模型显示出“有效稳健性”,而且极为罕见。确定这类模型并了解其特性,是改进分配外业绩的关键。我们在微调期间对有效稳健性进行彻底的经验性调查,并令人惊讶地发现,在较大数据集方面经过预先训练的模型在培训期间表现出有效的稳健性,在培训过程中会消失。我们研究了数据特性如何影响有效的稳健性,我们发现数据特性会随着数据集的更大规模、更多多样性和更高示例难度而增加。我们还发现,显示有效稳健性的模型能够正确分类目前其他测试模式无法纠正的10%的例子。最后,我们讨论了提高高精确度系统有效稳健性以提升状态模型的超分配准确性的若干战略。