超越清晰度地图:培训深模型以解释深模型 (Going Beyond Saliency Maps: Training Deep Models to Interpret Deep Models)

Interpretability is a critical factor in applying complex deep learning models to advance the understanding of brain disorders in neuroimaging studies. To interpret the decision process of a trained classifier, existing techniques typically rely on saliency maps to quantify the voxel-wise or feature-level importance for classification through partial derivatives. Despite providing some level of localization, these maps are not human-understandable from the neuroscience perspective as they do not inform the specific meaning of the alteration linked to the brain disorder. Inspired by the image-to-image translation scheme, we propose to train simulator networks that can warp a given image to inject or remove patterns of the disease. These networks are trained such that the classifier produces consistently increased or decreased prediction logits for the simulated images. Moreover, we propose to couple all the simulators into a unified model based on conditional convolution. We applied our approach to interpreting classifiers trained on a synthetic dataset and two neuroimaging datasets to visualize the effect of the Alzheimer's disease and alcohol use disorder. Compared to the saliency maps generated by baseline approaches, our simulations and visualizations based on the Jacobian determinants of the warping field reveal meaningful and understandable patterns related to the diseases.

翻译：解释是应用复杂的深层次学习模型以增进对神经成像研究中脑病的理解的一个关键因素。为了解释受过训练的分类师的决策过程,现有技术通常依靠显要的地图来量化通过部分衍生物进行分类的氧化物或特性的重要性。尽管提供了某种程度的本地化,但这些地图在神经科学方面是无法从人类的角度理解的,因为它们没有说明与大脑紊乱有关的改变的具体含义。受图像到图像转换办法的启发,我们提议培训模拟器网络,这些网络能够将给定的图像扭曲为该疾病的注射或去除模式。这些网络经过培训后,分类器能够持续增加或减少模拟图像的预测日志。此外,我们提议将所有模拟器合并成一个基于有条件演化的统一模型。我们运用了我们的方法来解释经过合成数据集培训的分类师和两个神经成像的数据集,以直观地貌地显示阿尔茨海默症和酒精使用紊乱的影响。与通过基线方法生成的显要性地图相比,我们的模拟和直观性地显示了与法学相关的病的正确性。