Recent advances in deep learning have led to interest in training deep learning models on longitudinal healthcare records to predict a range of medical events, with models demonstrating high predictive performance. Predictive performance is necessary but insufficient, however, with explanations and reasoning from models required to convince clinicians for sustained use. Rigorous evaluation of explainability is often missing, as comparisons between models (traditional versus deep) and various explainability methods have not been well-studied. Furthermore, ground truths needed to evaluate explainability can be highly subjective depending on the clinician's perspective. Our work is one of the first to evaluate explainability performance between and within traditional (XGBoost) and deep learning (LSTM with Attention) models on both a global and individual per-prediction level on longitudinal healthcare data. We compared explainability using three popular methods: 1) SHapley Additive exPlanations (SHAP), 2) Layer-Wise Relevance Propagation (LRP), and 3) Attention. These implementations were applied on synthetically generated datasets with designed ground-truths and a real-world medicare claims dataset. We showed that overall, LSTMs with SHAP or LRP provides superior explainability compared to XGBoost on both the global and local level, while LSTM with dot-product attention failed to produce reasonable ones. With the explosion of the volume of healthcare data and deep learning progress, the need to evaluate explainability will be pivotal towards successful adoption of deep learning models in healthcare settings.
翻译:最近深层学习的进展导致人们有兴趣对纵向保健记录进行深层次学习模型的培训,以预测一系列医学事件,模型显示高预测性性能。但是,预测性业绩是必要的,但不够充分,因为需要有说服临床医生持续使用所需的模型的解释和推理。由于模型(传统和深层)和各种解释性方法之间的比较没有很好地研究,因此往往缺乏严格的解释性评价。此外,根据临床医生的观点,评价解释性能所需的地面事实可能非常主观。我们的工作是首先评价传统(XGBoost)和深层次学习(注意的LSTM)之间和内部的解释性业绩的模型之一。我们用三种流行方法来比较解释解释可解释性:1)SHapley Additive Explectation (SHAP),2)图层-Wise 相关性 Propagation (LRP) 和3) 注意性。这些应用于合成生成的数据集,设计有地面和真实世界内部的可解释性(LSTM) 和深层次研究(关注性) 数据要求。我们展示总体、LSTMSMSA和高层次研究 数据系统将产生全球研究的不及高水平解释。