With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no consensus on how to evaluate them. On the other hand, robustness is a popular topic for deep learning research; however, it is hardly talked about in explainability until very recently. In this tutorial paper, we start by presenting gradient-based interpretability methods. These techniques use gradient signals to assign the burden of the decision on the input features. Later, we discuss how gradient-based methods can be evaluated for their robustness and the role that adversarial robustness plays in having meaningful explanations. We also discuss the limitations of gradient-based methods. Finally, we present the best practices and attributes that should be examined before choosing an explainability method. We conclude with the future directions for research in the area at the convergence of robustness and explainability.
翻译:随着深层神经网络的兴起,解释这些网络预测的挑战日益得到承认。虽然有许多解释深层神经网络决定的方法,但目前还没有就如何评估这些决定达成共识。另一方面,稳健性是深层学习研究的流行话题;然而,直到最近,人们才很少谈论其可解释性。在本指导文件中,我们首先提出基于梯度的可解释性方法。这些技术使用梯度信号来分配关于输入特性的决定的负担。随后,我们讨论如何评价梯度方法的稳健性和对抗性强度在作出有意义的解释方面的作用。我们还讨论了梯度方法的局限性。最后,我们介绍了在选择一种可解释性方法之前应加以研究的最佳做法和属性。我们最后以稳健性和可解释性相结合的未来研究方向来总结这一领域的研究方向。