The lack of well-calibrated confidence estimates makes neural networks inadequate in safety-critical domains such as autonomous driving or healthcare. In these settings, having the ability to abstain from making a prediction on out-of-distribution (OOD) data can be as important as correctly classifying in-distribution data. We introduce $p$-DkNN, a novel inference procedure that takes a trained deep neural network and analyzes the similarity structures of its intermediate hidden representations to compute $p$-values associated with the end-to-end model prediction. The intuition is that statistical tests performed on latent representations can serve not only as a classifier, but also offer a statistically well-founded estimation of uncertainty. $p$-DkNN is scalable and leverages the composition of representations learned by hidden layers, which makes deep representation learning successful. Our theoretical analysis builds on Neyman-Pearson classification and connects it to recent advances in selective classification (reject option). We demonstrate advantageous trade-offs between abstaining from predicting on OOD inputs and maintaining high accuracy on in-distribution inputs. We find that $p$-DkNN forces adaptive attackers crafting adversarial examples, a form of worst-case OOD inputs, to introduce semantically meaningful changes to the inputs.
翻译:缺乏经过充分校准的信任估计使神经网络在诸如自主驾驶或医疗保健等安全关键领域不够充分。 在这种环境下,有能力不预测分配外数据(OOOD),可以与正确分类分配数据一样重要。我们引入了美元-DkNN,这是一个创新的推论程序,采用经过训练的深层神经网络,分析其中间隐藏代表结构的相似性结构,以计算与端对端模型预测有关的美元价值。直觉是,对潜在代表机构进行的统计测试不仅可以作为分类者,而且还可以提供有统计根据的不确定性估计。 $p$-DkNNN是可推广的,利用隐蔽层所学的表述构成,从而使深层代表学习成功。 我们的理论分析基于Neyman-Pearson的分类,并将它与选择性分类的最新进展联系起来(选择选项)。我们发现,在不预测OOOD投入和保持高度的配置格式输入准确性之间,我们发现最差的Op$-DND投入是引入了最有说服力的OD-D的变造型攻击器。