Privacy and interpretability are two of the important ingredients for achieving trustworthy machine learning. We study the interplay of these two aspects in graph machine learning through graph reconstruction attacks. The goal of the adversary here is to reconstruct the graph structure of the training data given access to model explanations. Based on the different kinds of auxiliary information available to the adversary, we propose several graph reconstruction attacks. We show that additional knowledge of post-hoc feature explanations substantially increases the success rate of these attacks. Further, we investigate in detail the differences between attack performance with respect to three different classes of explanation methods for graph neural networks: gradient-based, perturbation-based, and surrogate model-based methods. While gradient-based explanations reveal the most in terms of the graph structure, we find that these explanations do not always score high in utility. For the other two classes of explanations, privacy leakage increases with an increase in explanation utility. Finally, we propose a defense based on a randomized response mechanism for releasing the explanations which substantially reduces the attack success rate. Our anonymized code is available.
翻译:隐私和可解释性是实现可信赖的机器学习的两个重要要素。 我们在图形机器通过图形重建攻击学习中研究这两个方面的相互作用。 这里对手的目标是重建获得模型解释的训练数据的图表结构。 基于对手可获得的不同类型的辅助信息, 我们提议了几起图重建攻击。 我们显示, 更多关于热后特性解释的知识大大提高了这些攻击的成功率。 此外, 我们详细调查了三个不同的图表神经网络解释方法类型攻击性能的差异: 梯度、 扰动基数和代理模型方法。 虽然基于梯度的解释显示的图形结构最多,但我们发现这些解释并非总能取得很高的效用。 对于其他两类解释,隐私渗漏增加,解释效用增加。 最后, 我们提出一种基于随机反应机制的辩护,以发布解释,大大降低攻击成功率。 我们的匿名代码是存在的。