The expansion of explainable artificial intelligence as a field of research has generated numerous methods of visualizing and understanding the black box of a machine learning model. Attribution maps are generally used to highlight the parts of the input image that influence the model to make a specific decision. On the other hand, the robustness of machine learning models to natural noise and adversarial attacks is also being actively explored. This paper focuses on evaluating methods of attribution mapping to find whether robust neural networks are more explainable. We explore this problem within the application of classification for medical imaging. Explainability research is at an impasse. There are many methods of attribution mapping, but no current consensus on how to evaluate them and determine the ones that are the best. Our experiments on multiple datasets (natural and medical imaging) and various attribution methods reveal that two popular evaluation metrics, Deletion and Insertion, have inherent limitations and yield contradictory results. We propose a new explainability faithfulness metric (called EvalAttAI) that addresses the limitations of prior metrics. Using our novel evaluation, we found that Bayesian deep neural networks using the Variational Density Propagation technique were consistently more explainable when used with the best performing attribution method, the Vanilla Gradient. However, in general, various types of robust neural networks may not be more explainable, despite these models producing more visually plausible attribution maps.
翻译:作为研究领域,扩大可解释的人工智能作为研究领域已经产生了许多可视化和理解机器学习模型黑盒的方法。 归属图通常用来突出影响模型作出具体决定的输入图像部分。 另一方面, 机器学习模型对自然噪音和对抗性攻击的稳健性也在积极探索中。 本文侧重于评价归属图绘制方法,以确定强健健的神经网络是否更易于解释。 我们在医疗成像分类应用中探讨了这一问题。 可解释性研究陷入僵局。 目前有许多归属图绘制方法,但对于如何评价这些方法并确定最佳的输入图像,目前没有共识。 我们在多个数据集(自然和医学成像)和各种归属方法方面的实验显示,两种流行的评价模型(Deletion和Interion)具有内在的局限性,并产生了相互矛盾的结果。 我们提出了一个新的解释性准确性指标(称为EvalAttAI),以解决先前测量的局限性。 我们用我们的新评价发现, Bayesian 深层神经网络使用Varicational Deparity Negragragation 技术, 更准确地解释这些直观性模型,尽管使用了各种直观性分析方法。</s>