The visualization and detection of anomalies (outliers) are of crucial importance to many fields, particularly cybersecurity. Several approaches have been proposed in these fields, yet to the best of our knowledge, none of them has fulfilled both objectives, simultaneously or cooperatively, in one coherent framework. The visualization methods of these approaches were introduced for explaining the output of a detection algorithm, not for data exploration that facilitates a standalone visual detection. This is our point of departure: UN-AVOIDS, an unsupervised and nonparametric approach for both visualization (a human process) and detection (an algorithmic process) of outliers, that assigns invariant anomalous scores (normalized to $[0,1]$), rather than hard binary-decision. The main aspect of novelty of UN-AVOIDS is that it transforms data into a new space, which is introduced in this paper as neighborhood cumulative density function (NCDF), in which both visualization and detection are carried out. In this space, outliers are remarkably visually distinguishable, and therefore the anomaly scores assigned by the detection algorithm achieved a high area under the ROC curve (AUC). We assessed UN-AVOIDS on both simulated and two recently published cybersecurity datasets, and compared it to three of the most successful anomaly detection methods: LOF, IF, and FABOD. In terms of AUC, UN-AVOIDS was almost an overall winner. The article concludes by providing a preview of new theoretical and practical avenues for UN-AVOIDS. Among them is designing a visualization aided anomaly detection (VAAD), a type of software that aids analysts by providing UN-AVOIDS' detection algorithm (running in a back engine), NCDF visualization space (rendered to plots), along with other conventional methods of visualization in the original feature space, all of which are linked in one interactive environment.
翻译:视觉化和探测异常(异常)对于许多领域,特别是网络安全至关重要。在这些领域,提出了几种办法,但据我们所知,这些办法都没有同时或合作在一个连贯的框架内实现这两个目标。采用这些办法的视觉化方法是为了解释探测算法的输出,而不是为了便于独立视觉检测的数据探索。这是我们的出发点:UN-AVOIDS,一种未经监督和非参数化的方法,既包括直观化(人类过程),也包括检测(算法过程),这些方法都是在实际的视觉化(人类过程)和检测(算法过程),在实际的视觉变异性分析分数中,没有同时或合作地在一个连贯的框架内实现这两个目标。UN-AVOIDS的新颖之处是将数据转换成一个新的空间,这是作为周围的累积密度功能(NCDFDF),在这个空间中,直观化和检测(直观化),因此,在检测算出一个高的直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直直径直径直径直径直地,在OA(OA),在OS 3OA UN-内,由OFOFIFA UNVS AS AS