Graph neural network (GNN) explanations have largely been facilitated through post-hoc introspection. While this has been deemed successful, many post-hoc explanation methods have been shown to fail in capturing a model's learned representation. Due to this problem, it is worthwhile to consider how one might train a model so that it is more amenable to post-hoc analysis. Given the success of adversarial training in the computer vision domain to train models with more reliable representations, we propose a similar training paradigm for GNNs and analyze the respective impact on a model's explanations. In instances without ground truth labels, we also determine how well an explanation method is utilizing a model's learned representation through a new metric and demonstrate adversarial training can help better extract domain-relevant insights in chemistry.
翻译:图像神经网络(GNN)的解释在很大程度上通过热后反省得到促进。 虽然这被认为是成功的,但许多热后解释方法在捕捉模型的学术代表性方面证明是失败的。 由于这一问题,值得考虑的是,如何培训模型,使其更便于进行热后分析。鉴于计算机视觉领域的对抗性培训成功,以更可靠的方式培训模型,我们提议为GNN提供类似的培训模式,并分析对模型解释的各自影响。在没有地面真实性标签的情况下,我们还确定解释方法如何通过新的衡量标准和示范性对抗性培训,利用模型的学术代表性,有助于更好地获取化学领域与领域相关的见解。