We combine the metrics of distance and isolation to develop the Analytic Isolation and Distance-based Anomaly (AIDA) detection algorithm. AIDA is the first distance-based method that does not rely on the concept of nearest-neighbours, making it a parameter-free model. Differently from the prevailing literature, in which the isolation metric is always computed via simulations, we show that AIDA admits an analytical expression for the outlier score, providing new insights into the isolation metric. Additionally, we present an anomaly explanation method based on AIDA, the Tempered Isolation-based eXplanation (TIX) algorithm, which finds the most relevant outlier features even in data sets with hundreds of dimensions. We test both algorithms on synthetic and empirical data: we show that AIDA is competitive when compared to other state-of-the-art methods, and it is superior in finding outliers hidden in multidimensional feature subspaces. Finally, we illustrate how the TIX algorithm is able to find outliers in multidimensional feature subspaces, and use these explanations to analyze common benchmarks used in anomaly detection.
翻译:我们结合了距离和隔离的衡量标准来开发分析隔离和远程异常(AIDA)检测算法。AIDA是第一个不依赖近邻概念的基于距离的方法,它使它成为无参数模型。与通常的文献不同,孤立指标总是通过模拟计算,我们表明,AIDA接受外部分数的分析表达方式,为孤立度指标提供了新的洞察力。此外,我们介绍了一种基于AIDA的异常解释方法,即以温度隔离为基础的电子Xplanation(TIX)算法,该算法发现最相关的外部特征,即使在具有数百维度的数据集中也是如此。我们在合成数据和经验数据上测试两种算法:我们表明,AIDA与其他最有竞争力,而且它更擅长于发现多维特征子空间中隐藏的外部。最后,我们说明了TIX算法如何在多维特征子空间中找到外部关系,并利用这些解释来分析异常现象探测中使用的共同基准。