Explainability of black-box machine learning models is crucial, in particular when deployed in critical applications such as medicine or autonomous cars. Existing approaches produce explanations for the predictions of models, however, how to assess the quality and reliability of such explanations remains an open question. In this paper we take a step further in order to provide the practitioner with tools to judge the trustworthiness of an explanation. To this end, we produce estimates of the uncertainty of a given explanation by measuring the ordinal consensus amongst a set of diverse bootstrapped surrogate explainers. While we encourage diversity by using ensemble techniques, we propose and analyse metrics to aggregate the information contained within the set of explainers through a rating scheme. We empirically illustrate the properties of this approach through experiments on state-of-the-art Convolutional Neural Network ensembles. Furthermore, through tailored visualisations, we show specific examples of situations where uncertainty estimates offer concrete actionable insights to the user beyond those arising from standard surrogate explainers.
翻译:黑箱机器学习模型的可解释性至关重要,特别是在医药或自主汽车等关键应用中部署时。现有的方法为模型预测提供了解释,但是,如何评估这种解释的质量和可靠性仍然是一个未决问题。在本文件中,我们进一步为从业者提供判断解释可信度的工具。为此,我们通过测量一组不同靴子的代言人解释者之间的一般共识,对给定解释的不确定性作出估计。虽然我们通过使用共通技术鼓励多样性,但我们提出和分析指标,通过评级计划汇总一组解释者所含信息。我们通过对最新工艺的进化神经网络的组合进行实验,从经验上说明这一方法的特性。此外,我们通过定制的视觉化,展示了不确定估计为用户提供具体可采取行动的洞察力,超出标准代言人的解释。