Because of the pervasive usage of Neural Networks in human sensitive applications, their interpretability is becoming an increasingly important topic in machine learning. In this work we introduce a simple way to interpret the output function of a neural network classifier that take as input categorical variables. By exploiting a mapping between a neural network classifier and a physical energy model, we show that in these cases each layer of the network, and the logits layer in particular, can be expanded as a sum of terms that account for the contribution to the classification of each input pattern. For instance, at the first order, the expansion considers just the linear relation between input features and output while at the second order pairwise dependencies between input features are also accounted for. The analysis of the contributions of each pattern, after an appropriate gauge transformation, is presented in two cases where the effectiveness of the method can be appreciated.
翻译:由于神经网络在人类敏感应用中的普遍使用,它们的可解释性正在成为机器学习中一个日益重要的主题。在这项工作中,我们引入了一种简单的方法来解释神经网络分类器的输出功能,该分类器以输入绝对变量作为输入变量。我们利用神经网络分类器和物理能量模型之间的绘图,表明在这些情况下,网络的每个层,特别是对线层,可以扩大为计算对每种输入模式分类的贡献的条件的总和。例如,在第一顺序,扩展仅考虑输入特征和输出之间的线性关系,而在第二顺序,输入特征之间的对等依赖性也得到核算。在适当的测量变换之后,对每种模式的贡献的分析在两种情况下,可以理解方法的有效性。