In machine learning, the use of algorithm-agnostic approaches is an emerging area of research for explaining the contribution of individual features towards the predicted outcome. Whilst there is a focus on explaining the prediction itself, a little has been done on explaining the robustness of these models, that is, how each feature contributes towards achieving that robustness. In this paper, we propose the use of Shapley values to explain the contribution of each feature towards the model's robustness, measured in terms of Receiver-operating Characteristics (ROC) curve and the Area under the ROC curve (AUC). With the help of an illustrative example, we demonstrate the proposed idea of explaining the ROC curve, and visualising the uncertainties in these curves. For imbalanced datasets, the use of Precision-Recall Curve (PRC) is considered more appropriate, therefore we also demonstrate how to explain the PRCs with the help of Shapley values.
翻译:在机器学习中,使用算法 -- -- 不可知性方法是一个新兴的研究领域,用于解释个别特征对预测结果的贡献。虽然重点是解释预测本身,但在解释这些模型的稳健性方面做了一点工作,即每个特征如何有助于实现这种稳健性。在本文中,我们提议使用Shapley值来解释每个特征对模型稳健性的贡献,以接收-操作特征曲线和ROC曲线(AUC)下的区域衡量。我们以一个示例为例,展示了解释ROC曲线和直观这些曲线的不确定性的拟议构想。对于不平衡的数据集,使用Precision-Recall Curve(PRC)被认为更为适当,因此我们还展示了如何用Shapley值的帮助解释PCr。