深创模型中远距离保证的传播外探测 (Out-of-Distribution Detection with Distance Guarantee in Deep Generative Models)

Recent research has revealed that deep generative models including flow-based models and Variational autoencoders may assign higher likelihood to out-of-distribution (OOD) data than in-distribution (ID) data. However, we cannot sample out OOD data from the model. This counterintuitive phenomenon has not been satisfactorily explained. In this paper, we prove theorems to investigate the divergences in flow-based model and give two explanations to the above phenomenon from divergence and geometric perspectives, respectively. Based on our analysis, we propose two group anomaly detection methods. Furthermore, we decompose the KL divergence and propose a point-wise anomaly detection method. We have conducted extensive experiments on prevalent benchmarks to evaluate our methods. For group anomaly detection (GAD), our method can achieve near 100\% AUROC on all problems and has robustness against data manipulations. On the contrary, the state-of-the-art (SOTA) GAD method performs not better than random guessing for challenging problems and can be attacked by data manipulation in almost all cases. For point-wise anomaly detection (PAD), our method is comparable to the SOTA PAD method on one category of problems and outperforms the baseline significantly on another category of problems.

翻译：最近的研究表明,深度基因模型,包括流基模型和变化式自动对立模型,与分布(ID)数据相比,对分配(OOOD)数据外移的可能性更大。然而,我们无法从模型中抽取OOD数据。这一反直觉现象没有得到令人满意的解释。在本文中,我们证明调查流基模型差异的理论,并分别从差异和几何角度对上述现象作出两个解释。根据我们的分析,我们建议两种群体异常现象探测方法。此外,我们分解 KL 差异,并提议一个点向异常现象探测方法。我们已经对普遍的基准进行了广泛的实验,以评价我们的方法。对于群反常现象探测(GAD),我们的方法可以在所有问题上达到近100 ⁇ AUROC,并且能够抵御数据操纵。相反,目前最先进的GAD方法比随机地猜测具有挑战性的问题要好得多,而且几乎在所有情况下都可以通过数据操纵来攻击。关于点向异常现象检测(PAD),我们的方法与SATA PAF的另一种基本方法相似。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。