Graph convolutional networks (GCNs) have been widely adopted for graph representation learning and achieved impressive performance. For larger graphs stored separately on different clients, distributed GCN training algorithms were proposed to improve efficiency and scalability. However, existing methods directly exchange node features between different clients, which results in data privacy leakage. Federated learning was incorporated in graph learning to tackle data privacy, while they suffer from severe performance drop due to non-iid data distribution. Besides, these approaches generally involve heavy communication and memory overhead during the training process. In light of these problems, we propose a Privacy-Preserving Subgraph sampling based distributed GCN training method (PPSGCN), which preserves data privacy and significantly cuts back on communication and memory overhead. Specifically, PPSGCN employs a star-topology client-server system. We firstly sample a local node subset in each client to form a global subgraph, which greatly reduces communication and memory costs. We then conduct local computation on each client with features or gradients of the sampled nodes. Finally, all clients securely communicate with the central server with homomorphic encryption to combine local results while preserving data privacy. Compared with federated graph learning methods, our PPSGCN model is trained on a global graph to avoid the negative impact of local data distribution. We prove that our PPSGCN algorithm would converge to a local optimum with probability 1. Experiment results on three prevalent benchmarks demonstrate that our algorithm significantly reduces communication and memory overhead while maintaining desirable performance. Further studies not only demonstrate the fast convergence of PPSGCN, but discuss the trade-off between communication and local computation cost as well.
翻译:在图表中广泛采用图示代表制,并取得了令人印象深刻的成绩; 对于在不同客户中分别储存的较大的图表,建议采用分布式GCN培训算法,以提高效率和可缩放性; 然而,现有方法在不同客户之间直接交换节点特征,从而导致数据隐私泄露; 将联邦学习纳入图表学习,以解决数据隐私问题,同时由于非二元数据分布而导致数据隐私性能严重下降; 此外,这些方法通常在培训过程中涉及大量的通信和记忆管理; 鉴于这些问题,我们提议采用基于分布式GCN培训的更大型图表取样法(PPCN培训算法),以保存数据隐私,大大削减通信和记忆管理; 具体地说,PSG使用星型客户-服务器系统直接交换节点,我们首先在图表中抽查每个客户的本地节点,以形成全球子图,从而大大降低通信和记忆成本成本成本; 最后,所有客户与中央服务器进行安全通信,使用统一式CN的加密,以合并当地通用的通信和记忆存储率数据流流流,同时,我们用经培训的缩缩略化的GSG数据流流化数据流化数据流化数据流流化数据流化数据流化数据流化数据流,从而大大降低。