Synthetic corruptions gathered into a benchmark are frequently used to measure neural network robustness to distribution shifts. However, robustness to synthetic corruption benchmarks is not always predictive of robustness to distribution shifts encountered in real-world applications. In this paper, we propose a methodology to build synthetic corruption benchmarks that make robustness estimations more correlated with robustness to real-world distribution shifts. Using the overlapping criterion, we split synthetic corruptions into categories that help to better understand neural network robustness. Based on these categories, we identify three relevant parameters to take into account when constructing a corruption benchmark that are the (1) number of represented categories, (2) their relative balance in terms of size and, (3) the size of the considered benchmark. In doing so, we build new synthetic corruption selections that are more predictive of robustness to natural corruptions than existing synthetic corruption benchmarks.
翻译:合成腐败被收集到一个基准中,常常用来衡量神经网络的稳健性,以衡量分布变化;然而,合成腐败基准的稳健性并不总是预测真实世界应用中遇到的分布变化的稳健性;在本文件中,我们提议了一种方法,以建立合成腐败基准,使稳健性估计与真实世界分布变化更加相关;使用重叠标准,我们将合成腐败分为有助于更好地了解神经网络稳健性的类别;根据这些类别,我们确定在建立腐败基准时需要考虑的三个相关参数,即(1) 代表性类别的数目,(2) 其规模的相对平衡,(3) 考虑基准的大小。为此,我们建立了新的合成腐败选择,这些选择比现有的合成腐败基准更能预测对自然腐败的稳健性。