革命神经神经网络可解释性基准 (Dermatological Diagnosis Explainability Benchmark for Convolutional Neural Networks)

In recent years, large strides have been taken in developing machine learning methods for dermatological applications, supported in part by the success of deep learning (DL). To date, diagnosing diseases from images is one of the most explored applications of DL within dermatology. Convolutional neural networks (ConvNets) are the most common (DL) method in medical imaging due to their training efficiency and accuracy, although they are often described as black boxes because of their limited explainability. One popular way to obtain insight into a ConvNet's decision mechanism is gradient class activation maps (Grad-CAM). A quantitative evaluation of the Grad-CAM explainability has been recently made possible by the release of DermXDB, a skin disease diagnosis explainability dataset which enables explainability benchmarking of ConvNet architectures. In this paper, we perform a literature review to identify the most common ConvNet architectures used for this task, and compare their Grad-CAM explanations with the explanation maps provided by DermXDB. We identified 11 architectures: DenseNet121, EfficientNet-B0, InceptionV3, InceptionResNetV2, MobileNet, MobileNetV2, NASNetMobile, ResNet50, ResNet50V2, VGG16, and Xception. We pre-trained all architectures on an clinical skin disease dataset, and fine-tuned them on a DermXDB subset. Validation results on the DermXDB holdout subset show an explainability F1 score of between 0.35-0.46, with Xception displaying the highest explainability performance. NASNetMobile reports the highest characteristic-level explainability sensitivity, despite it's mediocre diagnosis performance. These results highlight the importance of choosing the right architecture for the desired application and target market, underline need for additional explainability datasets, and further confirm the need for explainability benchmarking that relies on quantitative analyses.

翻译：近年来,在开发皮肤学应用的机器学习方法方面取得了长足的进步,这在一定程度上得到了深层学习的成功(DL)的支持。迄今为止,从图像中诊断疾病是皮肤学中最探索DL应用的一种。进化神经网络(ConvNets)是医学成像中最常用的(DL)方法,因为其培训效率和准确性,尽管它们往往被描述为黑盒,但是由于它们的可解释性有限,因此它们被描述为黑盒。对ConvNet决定机制的深入了解的一种流行方式是梯级启动图(Grad-CAM )。Grad-CAM解释性能解释性能的定量评估最近通过DermXDBD(皮肤病诊断性)解释了可解释性。在本文中,我们进行文献审查,以确定用于这项任务的最常用的ConvNet结构,并将这些Grad-CAM解释性能与DermX公司提供的解释性图表进行对比。我们发现11个结构:DenseNet 121、精巧的Net-B0、Inceptreal-NetdealitySmoveV2、SmoveVdeSdeSloveyNetdealevdeS dealevdealevdeal deal dealex deal deal deal dealisal2、Inde dismaxxxxx dreal deal demaxxxx disal deal deal deal demax、Indestr dismax dismaxdestr dismax、Oal demax smax smax disal deal demax disal deal deal deal deal deal deal destr drodalxildal demax16、Indiadaldal demaxd16、Indiadal deal deal dealdaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldbaldald16、Od16、Indiad16、Odaldalxxxx、Odex、Odalxxxxxxxxxxxx、在AS demax、在AS demax、在S demaxxxxx