Out-of-distribution detection is crucial to the safe deployment of machine learning systems. Currently, unsupervised out-of-distribution detection is dominated by generative-based approaches that make use of estimates of the likelihood or other measurements from a generative model. Reconstruction-based methods offer an alternative approach, in which a measure of reconstruction error is used to determine if a sample is out-of-distribution. However, reconstruction-based approaches are less favoured, as they require careful tuning of the model's information bottleneck - such as the size of the latent dimension - to produce good results. In this work, we exploit the view of denoising diffusion probabilistic models (DDPM) as denoising autoencoders where the bottleneck is controlled externally, by means of the amount of noise applied. We propose to use DDPMs to reconstruct an input that has been noised to a range of noise levels, and use the resulting multi-dimensional reconstruction error to classify out-of-distribution inputs. We validate our approach both on standard computer-vision datasets and on higher dimension medical datasets. Our approach outperforms not only reconstruction-based methods, but also state-of-the-art generative-based approaches. Code is available at https://github.com/marksgraham/ddpm-ood.
翻译:识别分布外的数据对于机器学习系统的安全部署至关重要。目前,基于生成模型的方法占据无监督识别分布外数据的主导地位,利用生成模型的似然估计或其他度量方法。重建方法提供了另一种方法,其中使用重构误差的度量来确定样本是否是分布外的。然而,重建方法并不是首选方法,因为它们需要仔细调节模型的信息瓶颈——例如潜在维度的大小——才能产生良好的结果。在本研究中,我们利用去噪扩散概率模型(DDPM)视为去噪自编码器的观点,其中瓶颈受到外部控制,通过应用不同程度的噪声来控制。我们建议使用DDPM对输入数据进行噪声处理,然后利用多维重构误差来分类分布外数据。我们在标准计算机视觉数据集和高维医学数据集上验证了我们的方法。我们的方法不仅优于基于重建的方法,而且优于最先进的基于生成模型的方法。代码在https://github.com/marksgraham/ddpm-ood得到了公开。