There are increasing demands for understanding deep neural networks' (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate the input feature's attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map. Unlike the previous techniques, ours doesn't rely on fitting a surrogate model nor need any path integration of gradients. Both qualitative and quantitative experiments demonstrate a superior performance of NeFLAG in generating more faithful attribution maps than the competing methods.
翻译:越来越需要理解深神经网络(DNNs)的行为。由于深神经网络结构的多层次非线性,解释DNN的预测仍是一个未解决的问题,使我们无法更深入地了解机制。为了提高DNNs的解释性,我们用差异和通量来估计输入特性对预测任务的归属。由于矢量分析的理论差异,我们开发了一个新的负负载聚合(NEFLAG)配方和高效的近似算法来估计归属图。与以往的技术不同,我们不依赖安装代用模型,也不需要梯度的任何路径整合。质量和数量实验都显示NEFLAG在生成比竞争性方法更忠实的归属图方面的优异性表现。