Deep networks often make confident, yet, incorrect, predictions when tested with outlier data that is far removed from their training distributions. Likelihoods computed by deep generative models (DGMs) are a candidate metric for outlier detection with unlabeled data. Yet, previous studies have shown that DGM likelihoods are unreliable and can be easily biased by simple transformations to input data. Here, we examine outlier detection with variational autoencoders (VAEs), among the simplest of DGMs. We propose novel analytical and algorithmic approaches to ameliorate key biases with VAE likelihoods. Our bias corrections are sample-specific, computationally inexpensive, and readily computed for various decoder visible distributions. Next, we show that a well-known image pre-processing technique -- contrast stretching -- extends the effectiveness of bias correction to further improve outlier detection. Our approach achieves state-of-the-art accuracies with nine grayscale and natural image datasets, and demonstrates significant advantages -- both with speed and performance -- over four recent, competing approaches. In summary, lightweight remedies suffice to achieve robust outlier detection with VAEs.
翻译:深网络在用远离培训分布的外差数据进行测试时,往往会作出自信的、但不准确的预测。深基因模型(DGMs)所计算的可能性是用未贴标签的数据进行外差检测的候选标准。然而,以往的研究显示,DGM的可能性是不可靠的,而且很容易被输入数据的简单转换所偏差。在这里,我们检查在DGMs中最简单的一组中,使用变异自动编码器(VAE)进行外部探测。我们提出了新的分析和算法方法,以改善VAE可能性的关键偏差。我们提出的偏差纠正是抽样的,计算成本低,并且很容易为各种解码可见的分布进行计算。接下来,我们表明,众所周知的图像处理前技术 -- -- 对比拉伸 -- -- 扩大了偏差校正的有效性,以进一步改进外差检测。我们的方法在9个灰度和自然图像数据集中取得了最新水平的偏差,并展示了显著的优势 -- -- 速度和性优势 -- -- 即最近四种方法。简而言,轻度的补救足以用VAEs实现强的外差探测。