One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network's prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.
翻译:发亮黑盒神经网络的主要方法之一是特性归属,即确定输入特征对网络预测的重要性。最近提出了预测特征信息作为衡量其重要性的替代物。到目前为止,预测信息仅通过在网络内设置信息瓶颈来确定潜在特征。我们建议了一种方法,用以识别输入域中带有预测信息的特点。该方法的结果是精确地识别输入特征信息,并且对网络结构具有不可知性。我们方法的核心思想是在输入上设置一个瓶颈,只能让与预测潜在特征相关的输入特征通过。我们用主流特性属性评价实验将我们的方法与几种特性属性归属方法进行比较。代码是公开的。