Node embedding aims to map nodes in the complex graph into low-dimensional representations. The real-world large-scale graphs and difficulties of labeling motivate wide studies of unsupervised node embedding problems. Nevertheless, previous effort mostly operates in a centralized setting where a complete graph is given. With the growing awareness of data privacy, data holders who are only aware of one vertex and its neighbours demand greater privacy protection. In this paper, we introduce FedWalk, a random-walk-based unsupervised node embedding algorithm that operates in such a node-level visibility graph with raw graph information remaining locally. FedWalk is designed to offer centralized competitive graph representation capability with data privacy protection and great communication efficiency. FedWalk instantiates the prevalent federated paradigm and contains three modules. We first design a hierarchical clustering tree (HCT) constructor to extract the structural feature of each node. A dynamic time warping algorithm seamlessly handles the structural heterogeneity across different nodes. Based on the constructed HCT, we then design a random walk generator, wherein a sequence encoder is designed to preserve privacy and a two-hop neighbor predictor is designed to save communication cost. The generated random walks are then used to update node embedding based on a SkipGram model. Extensive experiments on two large graphs demonstrate that Fed-Walk achieves competitive representativeness as a centralized node embedding algorithm does with only up to 1.8% Micro-F1 score and 4.4% Marco-F1 score loss while reducing about 6.7 times of inter-device communication per walk.
翻译:节点嵌入的目的是将复杂图形中的节点映射成低维表达式。 真实世界的大比例图表和标签的难度激励着对不受监督的节点嵌入问题进行广泛的研究。 尽管如此, 先前的努力大多在给出完整图表的集中环境中进行。 随着数据隐私意识的不断提高, 只了解一个顶点及其邻居的数据持有者要求更大的隐私保护。 在本文中, 我们引入了基于随机的、 不受监督的无监控的节点嵌算法, 它在这样的节点级可见度图表中运行, 原始图表信息仍留在本地。 美德瓦尔克设计了集中的竞争性图表代表能力, 并提供了数据隐私保护和高通信效率。 美德瓦尔克即时将流行的封装模式包含三个模块。 我们首先设计了一个等级组合树 (HCT) 来提取每个节点的结构特征。 一个动态的时间扭曲算算法, 顺利处理不同节点的结构性偏差。 基于构建的 HCT, 我们随后设计一个随机行走动发电机, 其中的顺序是用于数据缩缩缩缩缩缩图。