Given the great success of Deep Neural Networks(DNNs) and the black-box nature of it,the interpretability of these models becomes an important issue.The majority of previous research works on the post-hoc interpretation of a trained model.But recently, adversarial training shows that it is possible for a model to have an interpretable input-gradient through training.However,adversarial training lacks efficiency for interpretability.To resolve this problem, we construct an approximation of the adversarial perturbations and discover a connection between adversarial training and amplitude modulation. Based on a digital analogy,we propose noise modulation as an efficient and model-agnostic alternative to train a model that interprets itself with input-gradients.Experiment results show that noise modulation can effectively increase the interpretability of input-gradients model-agnosticly.
翻译:鉴于深神经网络(DNN)的巨大成功及其黑箱性质,这些模型的可解释性成为一个重要的问题。 以往关于对经过训练的模型进行热后解释的大部分研究工作。 但最近,对抗性培训表明,模型有可能通过培训获得可解释的投入等级。 但是,对抗性培训缺乏解释性效率。 为了解决这一问题,我们构建了对立干扰近似值,并发现了对立培训与振动调控之间的联系。基于数字类比,我们建议以噪音调节为一种高效的模型,作为培训模型的模型的模型,该模型用输入级模型来解释自己。 研究结果表明,噪音调制能有效地提高投入级模型的可解释性。