Most proposals in the anomaly detection field focus exclusively on the detection stage, specially in the recent deep learning approaches. While providing highly accurate predictions, these models often lack transparency, acting as "black boxes". This criticism has grown to the point that explanation is now considered very relevant in terms of acceptability and reliability. In this paper, we addressed this issue by inspecting the ADMNC (Anomaly Detection on Mixed Numerical and Categorical Spaces) model, an existing very accurate although opaque anomaly detector capable to operate with both numerical and categorical inputs. This work presents the extension EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces), which adds explainability to the predictions obtained with the original model. We preserved the scalability of the original method thanks to the Apache Spark framework. EADMNC leverages the formulation of the previous ADMNC model to offer pre hoc and post hoc explainability, while maintaining the accuracy of the original architecture. We present a pre hoc model that globally explains the outputs by segmenting input data into homogeneous groups, described with only a few variables. We designed a graphical representation based on regression trees, which supervisors can inspect to understand the differences between normal and anomalous data. Our post hoc explanations consist of a text-based template method that locally provides textual arguments supporting each detection. We report experimental results on extensive real-world data, particularly in the domain of network intrusion detection. The usefulness of the explanations is assessed by theory analysis using expert knowledge in the network intrusion domain.
翻译:异常探测场中的大多数建议都完全侧重于探测阶段,特别是最近的深层学习方法。这些模型虽然提供了高度准确的预测,但往往缺乏透明度,充当“黑盒子”。这种批评已发展到以下地步,即现在认为在可接受性和可靠性方面的解释非常相关。在本文件中,我们通过检查ADMNC(混合数字空间和星座空间的异常探测)模型来解决这个问题,该模型是现有的非常准确的、但不透明的异常探测器,能够同时使用数字和绝对的投入来运作。这项工作提供了EADMNC(混合数字空间和星域空间的异常探测)扩展,增加了从原始模型获得的预测的可解释性。我们保留了最初方法的可接受性和可靠性,这要归功于Apach Spark框架。 EADMNNC利用先前的ADMNC模型来提供临时和事后解释性,同时保持原始结构的准确性。我们提出了一个初步模型,通过将输入数据数据分解成同一组(仅用几个变量)来解释结果。我们设计了一个基于原始域域域域域图的图像的模型解释,我们用每一部域域域图解的模型来分析模型分析结果。