Deep learning (DL) models find increasing application in mental state decoding, where researchers seek to understand the mapping between mental states (e.g., perceiving fear or joy) and brain activity by identifying those brain regions (and networks) whose activity allows to accurately identify (i.e., decode) these states. Once a DL model has been trained to accurately decode a set of mental states, neuroimaging researchers often make use of interpretation methods from explainable artificial intelligence research to understand the model's learned mappings between mental states and brain activity. Here, we compare the explanation performance of prominent interpretation methods in a mental state decoding analysis of three functional Magnetic Resonance Imaging (fMRI) datasets. Our findings demonstrate a gradient between two key characteristics of an explanation in mental state decoding, namely, its biological plausibility and faithfulness: interpretation methods with high explanation faithfulness, which capture the model's decision process well, generally provide explanations that are biologically less plausible than the explanations of interpretation methods with less explanation faithfulness. Based on this finding, we provide specific recommendations for the application of interpretation methods in mental state decoding.
翻译:深度学习( DL) 模型发现在精神状态解码方面应用得越来越多, 研究人员试图通过识别那些活动能够准确识别( 解码)这些状态的大脑区域( 网络) 来理解精神状态( 感知恐惧或喜悦) 和大脑活动之间的映射。 一旦DL 模型受过精确解码一组精神状态的培训,神经成像研究者经常利用解释性人工智能研究中的解释性方法来理解该模型所学的心理状态与大脑活动之间的映射性。 在这里,我们比较了三个功能性磁共振成像( fMRI) 数据集在精神状态解码分析中突出解释性分析的性能。 我们的研究结果表明,在两种关键的精神状态解码解释性特征( 即生物光度和忠诚性) 之间有一种梯度: 解释性高解释性的方法,能捕捉模型的决策过程,通常提供比解释性解释性更不甚于解释性解释性的解释性,但解释性更低解释性。基于这一发现,我们提出了在精神状态解码中应用解释方法的具体建议。