Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with optimal data coverage under labeling budget constraints. In addition, we propose a new learning framework for semi-supervised AD. Extensive experiments on image, tabular, and video data sets show that our approach results in state-of-the-art semi-supervised AD performance under labeling budget constraints.
翻译:为专家反馈选择信息数据点可以大大改善不同情况下的异常现象检测(AD)的性能,例如医疗诊断或欺诈检测。在本文件中,我们确定了一系列理论条件,在其中,异常现象从标签查询到无标签数据之间进行一般评分。受这些结果的驱动,我们提出了一个数据标签战略,在标签预算限制下提供最佳数据覆盖面。此外,我们提出了半监督的AD的新学习框架。关于图像、表格和视频数据集的广泛实验表明,我们的方法在标签预算限制下产生了最先进的半监督的AD绩效。