In anomaly detection (AD), one seeks to identify whether a test sample is abnormal, given a data set of normal samples. A recent and promising approach to AD relies on deep generative models, such as variational autoencoders (VAEs), for unsupervised learning of the normal data distribution. In semi-supervised AD (SSAD), the data also includes a small sample of labeled anomalies. In this work, we propose two variational methods for training VAEs for SSAD. The intuitive idea in both methods is to train the encoder to `separate' between latent vectors for normal and outlier data. We show that this idea can be derived from principled probabilistic formulations of the problem, and propose simple and effective algorithms. Our methods can be applied to various data types, as we demonstrate on SSAD datasets ranging from natural images to astronomy and medicine, can be combined with any VAE model architecture, and are naturally compatible with ensembling. When comparing to state-of-the-art SSAD methods that are not specific to particular data types, we obtain marked improvement in outlier detection.
翻译:在异常检测(AD)中,我们试图确定一个测试样本是否异常,并给正常样本提供数据集。最近对AD采取的一种有希望的做法依赖于深层基因化模型,如变异自动编码器(VAEs),以便不监督正常数据分布。在半监督的AD(SAD)中,数据还包括一个标签异常的少量样本。在这项工作中,我们提出了两种不同的方法来培训SSAAD的VAE。两种方法的直觉想法是将编码器训练成普通和外部数据的潜在矢量之间的“分离”。我们表明,这一想法可以来自对问题有原则的概率性配方,并提出简单有效的算法。我们的方法可以适用于各种数据类型,从自然图像到天文和医学,都可以与VAE模型结构结合起来,并且自然地与混合。在比较并非特定数据类型的状态的SSAD方法时,我们得到了显著的改进。