This paper explores semi-supervised anomaly detection, a more practical setting for anomaly detection where a small set of labeled outlier samples are provided in addition to a large amount of unlabeled data for training. Rethinking the optimization target of anomaly detection, we propose a new objective function that measures the KL-divergence between normal and anomalous data, and prove that two factors: the mutual information between the data and latent representations, and the entropy of latent representations, constitute an integral objective function for anomaly detection. To resolve the contradiction in simultaneously optimizing the two factors, we propose a novel encoder-decoder-encoder structure, with the first encoder focusing on optimizing the mutual information and the second encoder focusing on optimizing the entropy. The two encoders are enforced to share similar encoding with a consistent constraint on their latent representations. Extensive experiments have revealed that the proposed method significantly outperforms several state-of-the-arts on multiple benchmark datasets, including medical diagnosis and several classic anomaly detection benchmarks.
翻译:本文探讨半监督的异常点检测,这是一个更实际的异常点检测环境,其中除了提供大量未贴标签的培训数据外,还提供了少量贴标签的异常点检测样本。重新思考异常点检测的最优化目标,我们建议一个新的客观功能,以测量正常数据和异常数据之间的 KL 差异度,并证明两个因素:数据和潜表层之间的相互信息,以及潜表层的酶,构成了异常点检测的一个整体客观功能。为了解决这一矛盾,同时优化这两个因素,我们建议建立一个新型的编码器结构,第一个编码器侧重于优化相互信息,第二个编码器侧重于优化对恒点的优化。两个编码器被强制使用来共享类似的编码,同时始终限制其潜在代表面。广泛的实验表明,拟议的方法大大超越了多个基准数据集的若干项状态,包括医学诊断和若干典型异常点检测基准。