Graphs are ubiquitous, and they can model unique characteristics and complex relations of real-life systems. Although using machine learning (ML) on graphs is promising, their raw representation is not suitable for ML algorithms. Graph embedding represents each node of a graph as a d-dimensional vector which is more suitable for ML tasks. However, the embedding process is expensive, and CPU-based tools do not scale to real-world graphs. In this work, we present GOSH, a GPU-based tool for embedding large-scale graphs with minimum hardware constraints. GOSH employs a novel graph coarsening algorithm to enhance the impact of updates and minimize the work for embedding. It also incorporates a decomposition schema that enables any arbitrarily large graph to be embedded with a single GPU. As a result, GOSH sets a new state-of-the-art in link prediction both in accuracy and speed, and delivers high-quality embeddings for node classification at a fraction of the time compared to the state-of-the-art. For instance, it can embed a graph with over 65 million vertices and 1.8 billion edges in less than 30 minutes on a single GPU.
翻译:图表是无处不在的, 它们可以模拟真实生活系统的独特特征和复杂关系。 虽然在图表上使用机器学习( ML) 很有希望, 但其原始表达方式不适合于 ML 算法。 图形嵌入方式代表一个图形的每个节点, 作为适合 ML 任务的 d- 维矢量。 然而, 嵌入过程费用昂贵, 以 CPU 为基础的工具不比真实世界图。 在这项工作中, 我们提出了一个 GOSH 工具, 一个基于 GOSH 的基于 GOSH 工具, 用于在最小硬件限制的情况下嵌入大型图表。 GOSH 使用一个新型的图形分析算法, 来增强更新的影响, 并最大限度地减少嵌入工作。 它还包含一个分解式的图状, 使任意的大图能与单个 GPU 一起嵌入。 因此, GOSH 在精确和速度两方面的预测中都设置了一个新的状态, 并且提供高质量的嵌入器, 用于在一定时间的一小部分的节点中, 。 例如, 它可以嵌入一个超过 6500万 GPI 。