In explainable artificial intelligence, discriminative feature localization is critical to reveal a blackbox model's decision-making process from raw data to prediction. In this article, we use two real datasets, the MNIST handwritten digits and MIT-BIH Electrocardiogram (ECG) signals, to motivate key characteristics of discriminative features, namely adaptiveness, predictive importance and effectiveness. Then, we develop a localization framework based on adversarial attacks to effectively localize discriminative features. In contrast to existing heuristic methods, we also provide a statistically guaranteed interpretability of the localized features by measuring a generalized partial $R^2$. We apply the proposed method to the MNIST dataset and the MIT-BIH dataset with a convolutional auto-encoder. In the first, the compact image regions localized by the proposed method are visually appealing. Similarly, in the second, the identified ECG features are biologically plausible and consistent with cardiac electrophysiological principles while locating subtle anomalies in a QRS complex that may not be discernible by the naked eye. Overall, the proposed method compares favorably with state-of-the-art competitors. Accompanying this paper is a Python library dnn-locate (https://dnn-locate.readthedocs.io/en/latest/) that implements the proposed approach.
翻译:在可解释的人工智能中,歧视特征本地化对于揭示黑盒模型决策过程从原始数据到预测的黑盒模型决策过程至关重要。在本篇文章中,我们使用两个真实的数据集,即MNIST手写数字和MIT-BIH电心电图(ECG)信号,以激励歧视特征的关键特征,即适应性、预测重要性和有效性。然后,我们根据对抗性攻击制定本地化框架,以有效地将歧视特征本地化。与现有的超常方法相比,我们还通过测量普遍部分值$R%2来提供对本地特征的统计保障解释。我们将拟议方法应用于MNIST数据集和MIT-BIH数据集,并配有卷式自动编码自动编码。首先,拟议方法中本地化的紧凑图像区域具有视觉吸引力。同样,在第二种情况下,所查明的ECG特征在生物上是可信的,符合心脏电物理原理,同时将微妙的异常异常的异常异常异常现象定位于赤眼中。总体而言,拟议的方法与州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-