Recent research has revealed that deep generative models including flow-based models and Variational autoencoders may assign higher likelihood to out-of-distribution (OOD) data than in-distribution (ID) data. However, we cannot sample out OOD data from the model. This counterintuitive phenomenon has not been satisfactorily explained. In this paper, we prove theorems to investigate the divergences in flow-based model and give two explanations to the above phenomenon from divergence and geometric perspectives, respectively. Based on our analysis, we propose two group anomaly detection methods. Furthermore, we decompose the KL divergence and propose a point-wise anomaly detection method. We have conducted extensive experiments on prevalent benchmarks to evaluate our methods. For group anomaly detection (GAD), our method can achieve near 100\% AUROC on all problems and has robustness against data manipulations. On the contrary, the state-of-the-art (SOTA) GAD method performs not better than random guessing for challenging problems and can be attacked by data manipulation in almost all cases. For point-wise anomaly detection (PAD), our method is comparable to the SOTA PAD method on one category of problems and outperforms the baseline significantly on another category of problems.
翻译:最近的研究表明,深度基因模型,包括流基模型和变化式自动对立模型,与分布(ID)数据相比,对分配(OOOD)数据外移的可能性更大。然而,我们无法从模型中抽取OOD数据。这一反直觉现象没有得到令人满意的解释。在本文中,我们证明调查流基模型差异的理论,并分别从差异和几何角度对上述现象作出两个解释。根据我们的分析,我们建议两种群体异常现象探测方法。此外,我们分解 KL 差异,并提议一个点向异常现象探测方法。我们已经对普遍的基准进行了广泛的实验,以评价我们的方法。对于群反常现象探测(GAD),我们的方法可以在所有问题上达到近100 ⁇ AUROC,并且能够抵御数据操纵。相反,目前最先进的GAD方法比随机地猜测具有挑战性的问题要好得多,而且几乎在所有情况下都可以通过数据操纵来攻击。关于点向异常现象检测(PAD),我们的方法与SATA PAF的另一种基本方法相似。