We propose a decentralised "local2global"' approach to graph representation learning, that one can a-priori use to scale any embedding technique. Our local2global approach proceeds by first dividing the input graph into overlapping subgraphs (or "patches") and training local representations for each patch independently. In a second step, we combine the local representations into a globally consistent representation by estimating the set of rigid motions that best align the local representations using information from the patch overlaps, via group synchronization. A key distinguishing feature of local2global relative to existing work is that patches are trained independently without the need for the often costly parameter synchronization during distributed training. This allows local2global to scale to large-scale industrial applications, where the input graph may not even fit into memory and may be stored in a distributed manner. We apply local2global on data sets of different sizes and show that our approach achieves a good trade-off between scale and accuracy on edge reconstruction and semi-supervised classification. We also consider the downstream task of anomaly detection and show how one can use local2global to highlight anomalies in cybersecurity networks.
翻译:我们提出一个分散化的“本地2Global”的图表代表学习方法, 以便人们可以优先使用该方法来推广任何嵌入技术。 我们的本地2Global方法首先将输入图分为重叠的子集( 或“ 空间”), 并独立地培训每个补丁的当地代表。 第二步, 我们将本地代表制合并成一个全球一致的代表制, 通过群体同步来估计一组硬性动议, 利用补丁重叠的信息来最佳地调整当地代表制。 与现有工作相比, 本地2Global的一个关键区别特征是, 补丁是独立培训, 不需要在分布式培训期间使用通常费用高昂的参数同步性。 这让本地2Global方法能够将本地2Global应用到大规模工业应用中, 在那里输入图甚至可能不适应记忆, 并且可以以分布式的方式存储。 我们在不同尺寸的数据集上应用本地2Global, 并表明我们的方法在边缘重建与半监控分类上实现了一个良好的平衡。 我们还考虑下游发现异常现象的下游任务, 并表明人们如何利用本地2Global 来突出网络中的异常现象。