Anomaly detection presents a unique challenge in machine learning, due to the scarcity of labeled anomaly data. Recent work attempts to mitigate such problems by augmenting training of deep anomaly detection models with additional labeled anomaly samples. However, the labeled data often does not align with the target distribution and introduces harmful bias to the trained model. In this paper, we aim to understand the effect of a biased anomaly set on anomaly detection. Concretely, we view anomaly detection as a supervised learning task where the objective is to optimize the recall at a given false positive rate. We formally study the relative scoring bias of an anomaly detector, defined as the difference in performance with respect to a baseline anomaly detector. We establish the first finite sample rates for estimating the relative scoring bias for deep anomaly detection, and empirically validate our theoretical results on both synthetic and real-world datasets. We also provide an extensive empirical study on how a biased training anomaly set affects the anomaly score function and therefore the detection performance on different anomaly classes. Our study demonstrates scenarios in which the biased anomaly set can be useful or problematic, and provides a solid benchmark for future research.
翻译:由于标签的异常率数据稀少,异常点检测在机器学习方面是一个独特的挑战。最近的工作试图通过增加有标签的异常点检测模型的培训来缓解这些问题。然而,标签数据往往与目标分布不一致,对经过培训的模型提出了有害偏见。在本文中,我们的目的是了解偏差异常点对异常点检测的影响。具体地说,我们认为异常点检测是一项监督的学习任务,目的是在给定的假正率优化回调。我们正式研究异常点检测器的相对评分偏差,其定义是基线异常点检测器的性能差异。我们建立了第一个有限的抽样率,用于估计深度异常点检测的相对评分偏差,并用经验验证我们在合成和现实世界数据集上的理论结果。我们还就偏差培训异常点如何影响异常分函数,从而影响不同异常点等级的检测性能,提供了广泛的实证研究。我们的研究显示,偏差的异常点组合在哪些情况下可能有用或有问题,并为未来研究提供一个坚实的基准。