Graph convolutional networks (GCNs) are a powerful architecture for representation learning and making predictions on documents that naturally occur as graphs, e.g., citation or social networks. Data containing sensitive personal information, such as documents with people's profiles or relationships as edges, are prone to privacy leaks from GCNs, as an adversary might reveal the original input from the trained model. Although differential privacy (DP) offers a well-founded privacy-preserving framework, GCNs pose theoretical and practical challenges due to their training specifics. We address these challenges by adapting differentially-private gradient-based training to GCNs. We investigate the impact of various privacy budgets, dataset sizes, and two optimizers in an experimental setup over five NLP datasets in two languages. We show that, under certain modeling choices, privacy-preserving GCNs perform up to 90% of their non-private variants, while formally guaranteeing strong privacy measures.
翻译:图表组合网络(GCN)是代表学习和预测自然以图表形式出现的文件的强大架构,例如引文或社交网络。包含敏感个人信息的数据,如带有人概况或关系作为边缘的文件,容易从GCN中泄漏隐私,因为对手可能透露经过培训的模式的原始投入。虽然差异隐私(DP)提供了一个有充分依据的隐私保护框架,但因其培训特点,GCN在理论和实践上构成挑战。我们通过对GCN进行差别化的私人梯度培训来应对这些挑战。我们调查了各种隐私预算、数据集大小和两个优化者在以两种语言在五个NLP数据集上进行实验设置的影响。我们表明,在某些模型选择下,隐私保护GCN最多达到其非私人变式的90%,同时正式保证强有力的隐私措施。