Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from severe conceptual problems. Furthermore, as we show in this paper, current explanation techniques do not perform adequately in the multi-label scenario, in which multiple medical findings may co-occur in a single image. We propose Attri-Net, an inherently interpretable model for multi-label classification. Attri-Net is a powerful classifier that provides transparent, trustworthy, and human-understandable explanations. The model first generates class-specific attribution maps based on counterfactuals to identify which image regions correspond to certain medical findings. Then a simple logistic regression classifier is used to make predictions based solely on these attribution maps. We compare Attri-Net to five post-hoc explanation techniques and one inherently interpretable classifier on three chest X-ray datasets. We find that Attri-Net produces high-quality multi-label explanations consistent with clinical knowledge and has comparable classification performance to state-of-the-art classification models.
翻译:然而,高性能黑盒神经网络并不能解释它们的预测,从而导致不信任和人类-ML合作的不最优化。 实践中广泛使用的热后解释技术已证明存在严重的概念问题。 此外,正如我们在本文件中显示的,目前的解释技术在多标签假设中不能充分发挥作用,在多标签假设中,多种医学发现可能在单一图像中同时出现。我们提议了Attri-Net,这是一个内在可解释的多标签分类模型。Attri-Net是一个强大的分类器,提供透明、可信和人无法理解的解释。模型首先产生基于反事实的分类属性图,以确定哪些图像区域与某些医学发现相符。然后,使用简单的后勤回归分类仪,仅根据这些属性图作出预测。我们把Attri-Net比作五个后热解析技术,并在三个胸X射线数据集上有一个内在可解释的分类模型。我们发现,Attri-Net具有高度质量的分类和可比较的临床模型。</s>