Large scale graph optimization problems arise in many fields. This paper presents an extensible, high performance framework (named OpenGraphGym-MG) that uses deep reinforcement learning and graph embedding to solve large graph optimization problems with multiple GPUs. The paper uses a common RL algorithm (deep Q-learning) and a representative graph embedding (structure2vec) to demonstrate the extensibility of the framework and, most importantly, to illustrate the novel optimization techniques, such as spatial parallelism, graph-level and node-level batched processing, distributed sparse graph storage, efficient parallel RL training and inference algorithms, repeated gradient descent iterations, and adaptive multiple-node selections. This study performs a comprehensive performance analysis on parallel efficiency and memory cost that proves the parallel RL training and inference algorithms are efficient and highly scalable on a number of GPUs. This study also conducts a range of large graph experiments, with both generated graphs (over 30 million edges) and real-world graphs, using a single compute node (with six GPUs) of the Summit supercomputer. Good scalability in both RL training and inference is achieved: as the number of GPUs increases from one to six, the time of a single step of RL training and a single step of RL inference on large graphs with more than 30 million edges, is reduced from 316.4s to 54.5s, and 23.8s to 3.4s, respectively. The research results on a single node lay out a solid foundation for the future work to address graph optimization problems with a large number of GPUs across multiple nodes in the Summit.
翻译:大型图形优化问题在许多领域出现 。 本文展示了一个可扩展的高性能框架( 名为 OpenGraphGym- MG ), 使用深强化学习和图形嵌入, 以解决多个 GPU 的大型图形优化问题 。 纸张使用共同 RL 算法( 深度 Q 学习) 和具有代表性的图形嵌入( 结构2vec ), 以展示框架的可扩展性, 最重要的是, 演示新型优化技术, 如空间平行、 图形水平和节点分级处理、 分布的稀薄图存储、 高效的平行 RL 培训和推断算法、 重复的梯度下移和适应性多点选择 。 此研究对平行效率和记忆成本进行全面的绩效分析, 证明平行 RL 培训和推断算法( 结构2300万以上边缘) 和真实世界的图形, 使用一个单一的配置节点( 6 GPUPOL ), 将一次高级 的训练分为 380 的高级G 级, 级, GPOL 级 的高级 和 级 级 级 级 级 级 的大型 的 级 将 的 的 将 的 的 向单个的 级 的 的 的 的 的 的 的 向单个的 的 的 的 级 级 级 级 级 的 递增级 递增级 递增 递增 的 的 的 的 的 的 的 的 级 的 的 递增 。