In this work, we attempt to explain the prediction of any black-box classifier from an information-theoretic perspective. For each input feature, we compare the classifier outputs with and without that feature using two information-theoretic metrics. Accordingly, we obtain two attribution maps--an information gain (IG) map and a point-wise mutual information (PMI) map. IG map provides a class-independent answer to "How informative is each pixel?", and PMI map offers a class-specific explanation of "How much does each pixel support a specific class?" Compared to existing methods, our method improves the correctness of the attribution maps in terms of a quantitative metric. We also provide a detailed analysis of an ImageNet classifier using the proposed method, and the code is available online.
翻译:在这项工作中,我们试图从信息理论角度解释对任何黑盒分类器的预测。 对于每个输入功能,我们使用两个信息理论测量仪,将分类器输出值与不使用该特性的输出值进行比较。 因此,我们获得了两个归属图-信息增益图和一个点对点的相互信息图。 IG 映射提供了“每个像素信息多丰富”的等级独立回答,而 PMI 映射则提供了“每个像素支持一个特定类的多少?” 与现有方法相比,我们的方法改进了归属图在定量测量上的正确性。 我们还提供了使用拟议方法对图像网络分类器的详细分析,代码可以在线查阅。