The use of algorithm-agnostic approaches is an emerging area of research for explaining the contribution of individual features towards the predicted outcome. Whilst there is a focus on explaining the prediction itself, a little has been done on explaining the robustness of these models, that is, how each feature contributes towards achieving that robustness. In this paper, we propose the use of Shapley values to explain the contribution of each feature towards the model's robustness, measured in terms of Receiver-operating Characteristics (ROC) curve and the Area under the ROC curve (AUC). With the help of an illustrative example, we demonstrate the proposed idea of explaining the ROC curve, and visualising the uncertainties in these curves. For imbalanced datasets, the use of Precision-Recall Curve (PRC) is considered more appropriate, therefore we also demonstrate how to explain the PRCs with the help of Shapley values. The explanation of robustness can help analysts in a number of ways, for example, it can help in feature selection by identifying the irrelevant features that can be removed to reduce the computational complexity. It can also help in identifying the features having critical contributions or negative contributions towards robustness.
翻译:使用算法-不可知性方法是一个新兴的研究领域,用于解释个别特征对预测结果的贡献。虽然目前的重点是解释预测本身,但在解释这些模型的稳健性方面做得很少,也就是说,每个特征如何有助于实现稳健性。在本文件中,我们提议使用Shapley 值来解释每个特征对模型稳健性的贡献,以收受者操作特征曲线和ROC曲线(AUC)下的区域衡量。我们借助一个示例,展示了解释ROC曲线的不相干性的拟议构想,并直观了这些曲线中的不确定性。对于不平衡的数据集,使用Precision-Recall Curve(PRC)被认为更为适当,因此我们也展示了如何用Shapley值的帮助解释PRC。对稳健性的解释可以帮助分析者以多种方式选择特征,例如,通过确定可以去除的不相干特征来降低计算复杂性,也可以帮助确定对关键特征做出消极贡献。