We investigate the influence of adversarial training on the interpretability of convolutional neural networks (CNNs), specifically applied to diagnosing skin cancer. We show that gradient-based saliency maps of adversarially trained CNNs are significantly sharper and more visually coherent than those of standardly trained CNNs. Furthermore, we show that adversarially trained networks highlight regions with significant color variation within the lesion, a common characteristic of melanoma. We find that fine-tuning a robust network with a small learning rate further improves saliency maps' sharpness. Lastly, we provide preliminary work suggesting that robustifying the first layers to extract robust low-level features leads to visually coherent explanations.
翻译:我们调查了对抗性培训对革命性神经网络(CNNs)可解释性的影响,具体适用于皮肤癌诊断。我们发现,对抗性训练的有线电视新闻网的梯度突出地图比受过标准训练的有线电视网的地图更加清晰,更具有视觉一致性。此外,我们发现,经过对抗性训练的网络突出显示,在损伤范围内有显著的颜色差异的区域,这是黑素瘤的一个共同特征。我们发现,微调一个具有小学习率的强大网络,进一步提高了突出度地图的清晰度。最后,我们提供的初步工作表明,为提取强健的低级特征而加强第一层结构可以导致视觉一致的解释。