Graph convolutional networks (GCNs) are a powerful architecture for representation learning on documents that naturally occur as graphs, e.g., citation or social networks. However, sensitive personal information, such as documents with people's profiles or relationships as edges, are prone to privacy leaks, as the trained model might reveal the original input. Although differential privacy (DP) offers a well-founded privacy-preserving framework, GCNs pose theoretical and practical challenges due to their training specifics. We address these challenges by adapting differentially-private gradient-based training to GCNs and conduct experiments using two optimizers on five NLP datasets in two languages. We propose a simple yet efficient method based on random graph splits that not only improves the baseline privacy bounds by a factor of 2.7 while retaining competitive F1 scores, but also provides strong privacy guarantees of epsilon = 1.0. We show that, under certain modeling choices, privacy-preserving GCNs perform up to 90% of their non-private variants, while formally guaranteeing strong privacy measures.
翻译:图表组合网络(GCN)是一个强大的结构,用于在自然以图表形式出现的文件中进行代表学习,例如引用或社交网络。然而,敏感的个人信息,例如带有人简介的文件或作为边缘关系的文件,容易出现隐私泄露,因为经过培训的模型可能揭示原始输入。虽然不同的隐私(DP)提供了一个有充分根据的隐私保护框架,但全球氯化萘因其培训特点而构成理论和实际挑战。我们通过对不同私营的梯度培训进行调整以适应于GCN, 并利用两个优化器对五种非私人版本数据集进行两种语言的实验来应对这些挑战。我们提出了一个基于随机图表分割的简单而有效的方法,该方法不仅通过2.7系数改善基线隐私的界限,同时保留竞争性F1分数,而且还为epsilon=1.0提供强有力的隐私保障。我们表明,在某些模式选择下,保护隐私的GCN可达到其非私营版本的90%,同时正式保证强有力的隐私措施。