Deep networks often make confident, yet incorrect, predictions when tested with outlier data that is far removed from their training distributions. Likelihoods computed by deep generative models are a candidate metric for outlier detection with unlabeled data. Yet, previous studies have shown that such likelihoods are unreliable and can be easily biased by simple transformations to input data. Here, we examine outlier detection with variational autoencoders (VAEs), among the simplest class of deep generative models. First, we show that a theoretically-grounded correction readily ameliorates a key bias with VAE likelihood estimates. The bias correction is model-free, sample-specific, and accurately computed with the Bernoulli and continuous Bernoulli visible distributions. Second, we show that a well-known preprocessing technique, contrast normalization, extends the effectiveness of bias correction to natural image datasets. Third, we show that the variance of the likelihoods computed over an ensemble of VAEs also enables robust outlier detection. We perform a comprehensive evaluation of our remedies with nine (grayscale and natural) image datasets, and demonstrate significant advantages, in terms of both speed and accuracy, over four other state-of-the-art methods. Our lightweight remedies are biologically inspired and may serve to achieve efficient outlier detection with many types of deep generative models.
翻译:深网络在用远离培训分布的异常数据进行测试时,往往会作出自信的预测,但却是不正确的预测。深基因模型所计算的偏差是用未贴标签数据进行异常检测的候选指标。然而,以往的研究显示,这种可能性是不可靠的,而且很容易因输入数据的简单转换而偏差。在这里,我们用最简单的深基因模型中最简单的一类,用变异自动变相器(VAE)来检查偏差。首先,我们表明,理论上的根据地校正很容易改善VAE概率估计的关键偏差。偏差纠正是没有模型的,抽样的,精确计算是用Bernoulli和持续Bernoulli的可见分布。第二,我们表明,一种众所周知的预处理技术,对比性,将偏差校正的有效性扩大到自然图像数据集。第三,我们表明,根据VAE的一组模型计算出来的可能性的差异,也使得对VAEE概率进行强的偏差检测。我们用9个(光尺度和自然)图像模型来进行全面评估。我们用无型、抽样和精确的模型来精确地计算。我们用四种测算出显著的机率方法,可以超越其他的精确地利用四种生物测算。