The explanation to an AI model's prediction used to support decision making in cyber security, is of critical importance. It is especially so when the model's incorrect prediction can lead to severe damages or even losses to lives and critical assets. However, most existing AI models lack the ability to provide explanations on their prediction results, despite their strong performance in most scenarios. In this work, we propose a novel explainable AI method, called PhilaeX, that provides the heuristic means to identify the optimized subset of features to form the complete explanations of AI models' predictions. It identifies the features that lead to the model's borderline prediction, and those with positive individual contributions are extracted. The feature attributions are then quantified through the optimization of a Ridge regression model. We verify the explanation fidelity through two experiments. First, we assess our method's capability in correctly identifying the activated features in the adversarial samples of Android malwares, through the features attribution values from PhilaeX. Second, the deduction and augmentation tests, are used to assess the fidelity of the explanations. The results show that PhilaeX is able to explain different types of classifiers correctly, with higher fidelity explanations, compared to the state-of-the-arts methods such as LIME and SHAP.
翻译:对AI模型用于支持网络安全决策的预测的解释至关重要。 当模型的不正确预测可能导致严重损坏甚至生命和关键资产损失时,尤其如此。 然而,大多数现有的AI模型尽管在多数情况下表现优异,却缺乏解释其预测结果的能力。 在这项工作中,我们提出了一个创新的、可解释的AI方法,称为菲莱克斯,它提供了确定最优化的特征子集以形成AI模型预测的完整解释。它确定了导致模型边界预测的特征,并提取了具有积极个人贡献的特征。然后,通过优化脊脊回归模型对特征属性进行量化。我们通过两个实验来核实其解释结果的准确性。首先,我们通过菲莱克斯的属性属性,评估我们正确识别抗争的Adroid 恶意样本中激活特征的方法的能力。 其次,扣减和增强测试,用来评估解释的准确性。结果显示,菲莱-克能够正确解释不同种类的解析方法,与高级解析方法相比,与高级解析方法。