In this growing age of data and technology, large black-box models are becoming the norm due to their ability to handle vast amounts of data and learn incredibly complex input-output relationships. The deficiency of these methods, however, is their inability to explain the prediction process, making them untrustworthy and their use precarious in high-stakes situations. SHapley Additive exPlanations (SHAP) analysis is an explainable AI method growing in popularity for its ability to explain model predictions in terms of the original features. For each sample and feature in the data set, we associate a SHAP value that quantifies the contribution of that feature to the prediction of that sample. Clustering these SHAP values can provide insight into the data by grouping samples that not only received the same prediction, but received the same prediction for similar reasons. In doing so, we map the various pathways through which distinct samples arrive at the same prediction. To showcase this methodology, we present a simulated experiment in addition to a case study in Alzheimer's disease using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We also present a novel generalization of the waterfall plot for multi-classification.
翻译:在数据与技术日益增长的时代,大型黑盒模型因其处理海量数据并学习极其复杂输入-输出关系的能力而成为常态。然而,这些方法的缺陷在于无法解释预测过程,使其在高风险场景中缺乏可信度且应用不稳定。SHapley可加性解释(SHAP)分析作为一种可解释人工智能方法,因其能够依据原始特征解释模型预测而日益普及。对于数据集中的每个样本和特征,我们关联一个SHAP值,该值量化了该特征对样本预测的贡献度。通过对这些SHAP值进行聚类,可以将不仅获得相同预测结果、且基于相似原因获得该预测的样本进行分组,从而深入揭示数据内在规律。在此过程中,我们描绘了不同样本达成相同预测的多元路径。为展示该方法,我们设计了一项模拟实验,并利用阿尔茨海默病神经影像学倡议(ADNI)数据库的数据进行了阿尔茨海默病案例研究。此外,本文还提出了一种适用于多分类场景的瀑布图广义化形式。