Anomaly detection aims at identifying data points that show systematic deviations from the majority of data in an unlabeled dataset. A common assumption is that clean training data (free of anomalies) is available, which is often violated in practice. We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models. The idea is to jointly infer binary labels to each datum (normal vs. anomalous) while updating the model parameters. Inspired by outlier exposure (Hendrycks et al., 2018) that considers synthetically created, labeled anomalies, we thereby use a combination of two losses that share parameters: one for the normal and one for the anomalous data. We then iteratively proceed with block coordinate updates on the parameters and the most likely (latent) labels. Our experiments with several backbone models on three image datasets, 30 tabular data sets, and a video anomaly detection benchmark showed consistent and significant improvements over the baselines.
翻译:一种常见的假设是,可以提供清洁的培训数据(没有异常现象),而这种数据在实践中经常被违反。我们提议了一项战略,在出现与广泛模型类别相容的未标记异常现象的情况下,培训异常现象探测器。我们的想法是,在更新模型参数的同时,对每个数据库(正常与异常)联合推导二元标签。受外部接触(Hendrycks等人,2018年)的启发,这种接触认为是合成生成的,有标签的异常现象,因此我们采用两种损失的组合,即正常和异常数据。然后,我们反复地对参数和最可能(延迟)的标签进行区块协调更新。我们在三个图像数据集、30套表格数据集和视频异常检测基准上与几个主模型的实验显示,基准一致和显著改进。