We hypothesize that similar objects should have similar outlier scores. To our knowledge, all existing outlier detectors calculate the outlier score for each object independently regardless of the outlier scores of the other objects. Therefore, they do not guarantee that similar objects have similar outlier scores. To verify our proposed hypothesis, we propose an outlier score post-processing technique for outlier detectors, called neighborhood averaging(NA), which pays attention to objects and their neighbors and guarantees them to have more similar outlier scores than their original scores. Given an object and its outlier score from any outlier detector, NA modifies its outlier score by combining it with its k nearest neighbors' scores. We demonstrate the effectivity of NA by using the well-known k-nearest neighbors (k-NN). Experimental results show that NA improves all 10 tested baseline detectors by 13% (from 0.70 to 0.79 AUC) on average evaluated on nine real-world datasets. Moreover, even outlier detectors that are already based on k-NN are also improved. The experiments also show that in some applications, the choice of detector is no more significant when detectors are jointly used with NA, which may pose a challenge to the generally considered idea that the data model is the most important factor. We open our code on www.outlierNet.com for reproducibility.
翻译:我们假设相似的对象具有相似的异常分数。据我们所知,所有现有的异常检测器均独立计算每个对象的异常分数,而不管其他对象的异常分数如何。因此,它们不能保证相似的对象具有相似的异常分数。为验证我们提出的假设,我们为异常检测器提出了一种名为邻域平均法(NA)的异常分数后处理技术,该技术关注对象及其邻居,保证它们的异常分数比原始分数更相似。给定来自任何异常检测器的对象及其异常分数,NA通过将其与其k个最近邻居的分数进行组合来修改其异常分数。我们使用著名的k最近邻(KNN)验证了NA的有效性。实验结果表明,NA可将9个真实世界数据集的10个基线检测器的AUC平均提高13%(从0.70到0.79)。此外,即使是基于k-NN的异常检测器也会得到改进。实验还表明,在某些应用中,当检测器与NA一起使用时,检测器的选择不再重要,这可能对通常认为数据模型是最重要因素的观点构成挑战。我们在www.outlierNet.com上公开了代码以便重现。