The lack of interpretability of the Vision Transformer may hinder its use in critical real-world applications despite its effectiveness. To overcome this issue, we propose a post-hoc interpretability method called VISION DIFFMASK, which uses the activations of the model's hidden layers to predict the relevant parts of the input that contribute to its final predictions. Our approach uses a gating mechanism to identify the minimal subset of the original input that preserves the predicted distribution over classes. We demonstrate the faithfulness of our method, by introducing a faithfulness task, and comparing it to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K, achieving compelling results. To aid reproducibility and further extension of our work, we open source our implementation: https://github.com/AngelosNal/Vision-DiffMask
翻译:缺乏解释能力可能会限制视觉Transformer在重要实际应用上的使用,但其效果卓著。为了解决这个问题,我们提出了一种后验解释方法——VISION DIFFMASK,它利用模型隐藏层的激活来预测对最终预测产生贡献的输入的相关部分。我们的方法使用一个门控机制来确定最小的原始输入子集,以保留预测的类别分布。通过引入一种忠实度任务,并在CIFAR-10和ImageNet-1K上与其他最先进的归因方法进行比较,我们证明了方法的忠实度,取得了令人信服的结果。为了促进我们工作的复现和进一步扩展,我们开源了我们的实现:https://github.com/AngelosNal/Vision-DiffMask