We argue that, when learning a 1-Lipschitz neural network with the dual loss of an optimal transportation problem, the gradient of the model is both the direction of the transportation plan and the direction to the closest adversarial attack. Traveling along the gradient to the decision boundary is no more an adversarial attack but becomes a counterfactual explanation, explicitly transporting from one class to the other. Through extensive experiments on XAI metrics, we find that the simple saliency map method, applied on such networks, becomes a reliable explanation, and outperforms the state-of-the-art explanation approaches on unconstrained models. The proposed networks were already known to be certifiably robust, and we prove that they are also explainable with a fast and simple method.
翻译:我们争论说,当学习了具有最佳运输问题双重损失的1-Lipschitz神经网络时,模型的梯度既是运输计划的方向,也是最接近对抗性攻击的方向。 沿着梯度前往决定边界不再是一种对抗性攻击,而成为一种反事实解释,明确从一个舱到另一个舱的运输。 通过对XAI指标的广泛实验,我们发现在这种网络上应用的简单突出的地图方法成为可靠的解释,并超越了对未受限制的模式采用的最新解释方法。 所拟议的网络已经众所周知是稳健的,我们也证明它们可以用快速和简单的方法加以解释。