Node embedding aims to map nodes in the complex graph into low-dimensional representations. The real-world large-scale graphs and difficulties of labeling motivate wide studies of unsupervised node embedding problems. Nevertheless, previous effort mostly operates in a centralized setting where a complete graph is given. With the growing awareness of data privacy, data holders who are only aware of one vertex and its neighbours demand greater privacy protection. In this paper, we introduce FedWalk, a random-walk-based unsupervised node embedding algorithm that operates in such a node-level visibility graph with raw graph information remaining locally. FedWalk is designed to offer centralized competitive graph representation capability with data privacy protection and great communication efficiency. FedWalk instantiates the prevalent federated paradigm and contains three modules. We first design a hierarchical clustering tree (HCT) constructor to extract the structural feature of each node. A dynamic time wrapping algorithm seamlessly handles the structural heterogeneity across different nodes. Based on the constructed HCT, we then design a random walk generator, wherein a sequence encoder is designed to preserve privacy and a two-hop neighbor predictor is designed to save communication cost. The generated random walks are then used to update node embedding based on a SkipGram model. Extensive experiments on two large graphs demonstrate that Fed-Walk achieves competitive representativeness as a centralized node embedding algorithm does with only up to 1.8% Micro-F1 score and 4.4% Marco-F1 score loss while reducing about 6.7 times of inter-device communication per walk.
翻译:节点嵌入的目的是将复杂图形中的节点映射成低维表达式。 真实世界的大型图表和标签的难度激励着对不受监督的节点嵌入问题进行广泛的研究。 尽管如此, 先前的努力大多在给出完整图表的集中环境中进行。 随着数据隐私意识的不断提高, 只了解一个顶点及其邻居的数据持有者要求更大的隐私保护。 在本文中, 我们引入了基于随机行走的、 不受监督的节点嵌入算法, 该算法在这种节点级可见度图中运行, 原始图表信息还留在本地。 FedWalk旨在提供集中的竞争性图表代表能力, 并保护数据隐私保护, 并包含三个模块。 我们首先设计一个等级组合树 (HCT) 来提取每个节点的结构特征。 一个动态的缩放算法能够无缝地处理不同节点的结构性偏差。 基于构建的 HCT, 我们然后设计一个随机行走动发电机, 其中包含一个具有数据隐私保护的竞争性直径的序列, 然后用双级滚动的直路路进行。