The ability to detect Out-of-Distribution (OOD) data is important in safety-critical applications of deep learning. The aim is to separate In-Distribution (ID) data drawn from the training distribution from OOD data using a measure of uncertainty extracted from a deep neural network. Deep Ensembles are a well-established method of improving the quality of uncertainty estimates produced by deep neural networks, and have been shown to have superior OOD detection performance compared to single models. An existing intuition in the literature is that the diversity of Deep Ensemble predictions indicates distributional shift, and so measures of diversity such as Mutual Information (MI) should be used for OOD detection. We show experimentally that this intuition is not valid on ImageNet-scale OOD detection -- using MI leads to 30-40% worse %FPR@95 compared to single-model entropy on some OOD datasets. We suggest an alternative explanation for Deep Ensembles' better OOD detection performance -- OOD detection is binary classification and we are ensembling diverse classifiers. As such we show that practically, even better OOD detection performance can be achieved for Deep Ensembles by averaging task-specific detection scores such as Energy over the ensemble.
翻译:在深层学习的安全关键应用中,检测传播数据的能力很重要。目的是利用从深神经网络中提取的不确定性度度,将培训分布中的数据从OOOD数据中分离出来。深团是提高深神经网络产生的不确定性估计质量的既定方法,并被证明与单一模型相比具有优异 OOOD检测性能。文献中的现有直觉是,深团集合预测的多样性显示分布变化,因此,在OOOD检测时,应使用共同信息(MI)等多样性计量方法。我们实验性地表明,在图像网络规模的OOOD检测中,这种直觉无效 -- -- 使用MI导致的误差为30%-40% FPR@95,而与OOD数据集中的单一模型酶检测性能相比,这种直觉质量更差为30%-40%。我们为深团的OOD检测性能提供了一种替代解释。我们建议,深团集合的检测是分解的分解,我们正在将多种分类方法结合起来。我们从实际上表明,甚至更精确的OOD检测结果,通过深度测算结果,可以超过OD的频率。