OpenGraphGym-MG:利用强化学习解决多GPU系统大图优化问题 (OpenGraphGym-MG: Using Reinforcement Learning to Solve Large Graph Optimization Problems on MultiGPU Systems)

Large scale graph optimization problems arise in many fields. This paper presents an extensible, high performance framework (namedOpenGraphGym-MG) that uses deep reinforcement learning and graph embedding to solve large graph optimization problems with multiple GPUs. The paper uses a common RL algorithm (deep Q-learning) and a representative graph embedding (structure2vec) to demonstrate the extensibility of the framework and, most importantly, to illustrate the novel optimization techniques, such as spatial parallelism, graph-level and node-level batched processing, distributed sparse graph storage, efficient parallel RL training and inference algorithms, repeated gradient descent iterations, and adaptive multiple-node selections. This study performs a comprehensive performance analysis on parallel efficiency and memory cost that proves the parallel RL training and inference algorithms are efficient and highly scalable on a number of GPUs. This study also conducts a range of large graph experiments, with both generated graphs (over 30 million edges) and real-world graphs, using a single compute node (with six GPUs) of the Summit supercomputer. Good scalability in both RL training and inference is achieved: as the number of GPUs increases from one to six, the time of a single step of RL training and a single step of RL inference on large graphs with more than 30 million edges, is reduced from 316.4s to 54.5s, and 23.8s to 3.4s, respectively. The research results on a single node lay out a solid foundation for the future work to address graph optimization problems with a large number of GPUs across multiple nodes in the Summit.

翻译：大型图形优化问题在许多领域出现。本文展示了一个可扩展的高性能框架( 名为 OpenGraphGym- MG ), 用于深强化学习和图形嵌入, 以解决多个 GPU 的大型图形优化问题。纸张使用共同 RL 算法( 深度 Q 学习) 和具有代表性的图形嵌入( 结构2vec ), 以展示框架的可扩展性, 最重要的是, 演示新型优化技术, 如空间平行、图形水平和节级分解处理、分布的稀薄的图表存储、高效的平行 RLL 培训和推断算法、重复的梯度下移和适应性多节点选择。此研究对平行效率和记忆成本进行了全面的绩效分析, 证明平行 RL 培训和推断算法( 结构2300万边缘) 和真实世界的图形都进行了一系列大型的图形实验, 使用单一的匹配节点( 6 GPUPO ), 和的单个直径直径直径直径直径直径直至根基, 的G 直径直到直径直径直至。 GL 的测为30, 列列列的测测测测的测的测距的测距距距距距距距距距距距距距距距距距距距距距距距距距距为30,, 直距距距距上一个大的直距为直距上一个大的直径直径直径直距距, 。直径直径的测距距上的直距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距为直距为直距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距距