Intuitively, one would expect accuracy of a trained neural network's prediction on test samples to correlate with how densely the samples are surrounded by seen training samples in representation space. We find that a bound on empirical training error smoothed across linear activation regions scales inversely with training sample density in representation space. Empirically, we verify this bound is a strong predictor of the inaccuracy of the network's prediction on test samples. For unseen test sets, including those with out-of-distribution samples, ranking test samples by their local region's error bound and discarding samples with the highest bounds raises prediction accuracy by up to 20% in absolute terms for image classification datasets, on average over thresholds.
翻译:直观地说,人们会期望经过训练的神经网络对测试样品的预测准确性与在代表空间中看到的培训样品所覆盖的样品密度如何相关。我们发现,线性活化区域的经验性培训错误的界限与代表空间的培训样品密度的长度反向平滑。我们凭空核查这一界限是网络对测试样品的预测不准确的有力预测者。对于包括有分布范围外样品的试验机组、按其本地区域误差排列的试验样品和丢弃最高界限的样品在内的无形测试机组而言,平均高于阈值的图像分类数据集预测准确性最高达20%。