Deep generative models are challenging the classical methods in the field of anomaly detection nowadays. Every new method provides evidence of outperforming its predecessors, often with contradictory results. The objective of this comparison is twofold: to compare anomaly detection methods of various paradigms with focus on deep generative models, and identification of sources of variability that can yield different results. The methods were compared on popular tabular and image datasets. We identified the main sources of variability to be experimental conditions: i) the type data set (tabular or image) and the nature of anomalies (statistical or semantic), and ii) strategy of selection of hyperparameters, especially the number of available anomalies in the validation set. Different methods perform the best in different contexts, i.e. combination of experimental conditions together with computational time. This explains the variability of the previous results and highlights the importance of careful specification of the context in the publication of a new method. All our code and results are available for download.
翻译:深度基因模型正在挑战当前异常现象探测领域的典型方法。 每一种新方法都提供了其前身表现优于以往方法的证据,往往结果相互矛盾。 比较的目的是双重的:比较各种范式的异常检测方法,重点是深层基因模型,并查明可产生不同结果的变异源。 方法在流行的表格和图像数据集中进行了比较。 我们确定的主要变异源是实验性条件: (一) 类型数据集(表或图像)和异常的性质(统计或语义学),以及 (二) 选择超参数的战略,特别是验证集中现有异常数。 不同方法在不同情况下表现最佳, 即实验条件与计算时间相结合。 这解释了以往结果的变异性,并强调了在公布新方法时谨慎地说明背景的重要性。 我们的所有代码和结果都可供下载。