通过反向攻击评估深诊断模型的强力 (Towards Evaluating the Robustness of Deep Diagnostic Models by Adversarial Attack)

Deep learning models (with neural networks) have been widely used in challenging tasks such as computer-aided disease diagnosis based on medical images. Recent studies have shown deep diagnostic models may not be robust in the inference process and may pose severe security concerns in clinical practice. Among all the factors that make the model not robust, the most serious one is adversarial examples. The so-called "adversarial example" is a well-designed perturbation that is not easily perceived by humans but results in a false output of deep diagnostic models with high confidence. In this paper, we evaluate the robustness of deep diagnostic models by adversarial attack. Specifically, we have performed two types of adversarial attacks to three deep diagnostic models in both single-label and multi-label classification tasks, and found that these models are not reliable when attacked by adversarial example. We have further explored how adversarial examples attack the models, by analyzing their quantitative classification results, intermediate features, discriminability of features and correlation of estimated labels for both original/clean images and those adversarial ones. We have also designed two new defense methods to handle adversarial examples in deep diagnostic models, i.e., Multi-Perturbations Adversarial Training (MPAdvT) and Misclassification-Aware Adversarial Training (MAAdvT). The experimental results have shown that the use of defense methods can significantly improve the robustness of deep diagnostic models against adversarial attacks.

翻译：深层次的学习模型(与神经网络)被广泛用于具有挑战性的任务,如基于医疗图像的计算机辅助疾病诊断,最近的研究表明,深层次的诊断模型在推断过程中可能不健全,在临床实践中可能造成严重的安全关切。在所有使模型不健全的因素中,最严重的是对抗性实例。所谓的“对抗性实例”是一个设计周密的扰动模型,不易为人类所察觉,但导致深刻诊断模型的错误输出,并具有很高的自信。在本文中,我们通过对抗性攻击来评估深层次诊断模型的稳健性。具体地说,我们在单一标签和多标签分类的分类任务中对三种深层次诊断模型进行了两种对抗性攻击,发现这些模型在受到对抗性实例攻击时并不可靠。我们进一步探讨了对抗性实例是如何攻击模型的,方法是分析其定量分类结果、中间特征、特征的不稳定性以及深度诊断性诊断性模型和这些对抗性对抗性对抗性模型的估计标签的关联性结果。我们还设计了两种新的防御性方法,用以处理深层次诊断模型中的对抗性实例,即ADREDA、MISA、MAAAAADRE、MADRAVADADADADAAADADADADADADADADADADADADADADADADADADADADADADADADADADADADADADAAAADADADADADADADADADADADADADADADADADADADADADADADADADADADADADADADADADADADADA