Anomaly detection is critical in various fields, including intrusion detection, health monitoring, fault diagnosis, and sensor network event detection. The isolation forest (or iForest) approach is a well-known technique for detecting anomalies. It is, however, ineffective when dealing with dynamic streaming data, which is becoming increasingly prevalent in a wide variety of application areas these days. In this work, we extend our previous work by proposed an efficient iForest based approach for anomaly detection using cube sampling that is effective on streaming data. Cube sampling is used in the initial stage to choose nearly balanced samples, significantly reducing storage requirements while preserving efficiency. Following that, the streaming nature of data is addressed by a sliding window technique that generates consecutive chunks of data for systematic processing. The novelty of this paper is in applying Cube sampling in iForest and calculating inclusion probability. The proposed approach is equally successful at detecting anomalies as existing state-of-the-art approaches, requiring significantly less storage and time complexity. We undertake empirical evaluations of the proposed approach using standard datasets and demonstrate that it outperforms traditional approaches in terms of Area Under the ROC Curve (AUC-ROC) and can handle high-dimensional streaming data.
翻译:在各个领域,包括入侵探测、健康监测、过失诊断和传感器网络事件探测等领域,异常探测至关重要。隔离林(或森林)方法是发现异常现象的著名技术,但是在处理动态流数据时效果不彰,这些数据在当今各种应用领域日益普遍。在这项工作中,我们扩大了我们以前的工作,提议采用基于森林的高效方法,利用对流数据有效的立方体取样法来探测异常现象。在初始阶段,使用立方体取样方法选择接近平衡的样品,大大减少储存要求,同时保持效率。随后,数据流的性质通过滑动窗口技术来解决,这种技术产生连续大量数据以供系统处理。本文的新颖之处是在森林中应用立方取样和计算包容概率。拟议方法同样成功地发现异常现象,像现有的最新方法那样,需要大大降低储存和时间的复杂性。我们使用标准数据集对拟议方法进行实证评估,并表明它优于ROC Curve(AUC-ROC)下地区的传统方法,可以处理高流数据。