This study explores the concept of high-density anomalies. As opposed to the traditional concept of anomalies as isolated occurrences, high-density anomalies are deviant cases positioned in the most normal regions of the data space. Such anomalies are relevant for various practical use cases, such as misbehavior detection and data quality analysis. Effective methods for identifying them are particularly important when analyzing very large or noisy sets, for which traditional anomaly detection algorithms will return many false positives. In order to be able to identify high-density anomalies, this study introduces several non-parametric algorithmic frameworks for unsupervised detection. These frameworks are able to leverage existing underlying anomaly detection algorithms and offer different solutions for the balancing problem inherent in this detection task. The frameworks are evaluated with both synthetic and real-world datasets, and are compared with existing baseline algorithms for detecting traditional anomalies. The Iterative Partial Push (IPP) framework proves to yield the best detection results.
翻译:本研究探讨了高密度异常的概念。与偏僻现象的传统异常概念相反,高密度异常现象是位于数据空间最正常区域的异常情况。这些异常现象与各种实际使用案例有关,例如误用检测和数据质量分析。在分析非常大或吵动的数据集时,确定这些异常现象的有效方法特别重要,传统的异常检测算法将返回许多虚假的阳性。为了能够识别高密度异常现象,本研究提出了若干非参数性算法框架,以便进行不受监督的检测。这些框架能够利用现有的异常检测基本算法,并为这一检测任务中固有的平衡问题提供不同的解决办法。这些框架用合成和真实世界数据集加以评估,并与现有的检测传统异常现象的基线算法进行比较。热源部分推(IPP)框架证明最佳的检测结果。