高性能计算机多层工作在线日程组合的可缩放深强化学习模型 (A Scalable Deep Reinforcement Learning Model for Online Scheduling Coflows of Multi-Stage Jobs for High Performance Computing)

Coflow is a recently proposed networking abstraction to help improve the communication performance of data-parallel computing jobs. In multi-stage jobs, each job consists of multiple coflows and is represented by a Directed Acyclic Graph (DAG). Efficiently scheduling coflows is critical to improve the data-parallel computing performance in data centers. Compared with hand-tuned scheduling heuristics, existing work DeepWeave [1] utilizes Reinforcement Learning (RL) framework to generate highly-efficient coflow scheduling policies automatically. It employs a graph neural network (GNN) to encode the job information in a set of embedding vectors, and feeds a flat embedding vector containing the whole job information to the policy network. However, this method has poor scalability as it is unable to cope with jobs represented by DAGs of arbitrary sizes and shapes, which requires a large policy network for processing a high-dimensional embedding vector that is difficult to train. In this paper, we first utilize a directed acyclic graph neural network (DAGNN) to process the input and propose a novel Pipelined-DAGNN, which can effectively speed up the feature extraction process of the DAGNN. Next, we feed the embedding sequence composed of schedulable coflows instead of a flat embedding of all coflows to the policy network, and output a priority sequence, which makes the size of the policy network depend on only the dimension of features instead of the product of dimension and number of nodes in the job's DAG.Furthermore, to improve the accuracy of the priority scheduling policy, we incorporate the Self-Attention Mechanism into a deep RL model to capture the interaction between different parts of the embedding sequence to make the output priority scores relevant. Based on this model, we then develop a coflow scheduling algorithm for online multi-stage jobs.

翻译： Coflow 是最近提议的一个网络化抽象信息, 以帮助提高数据平行计算工作的通信性能。在多阶段任务中, 每个任务由多个串流组成, 由一个直接循环图(DAG) 代表。高效列表共流对于提高数据中心的数据平行计算性能至关重要。与手控调度列表超常性关系相比, DeepWave [1] 现有工作利用强化学习(RL) 框架, 以自动生成高效的连流列表调度政策。它使用一个图形神经网络(GNN) 来将工作信息编码成一组嵌入矢量, 并为包含整个工作信息到政策网络的平坦嵌入矢量矢量矢量。然而, 这种方法的可调不易, 因为它无法应对由任意大小和形状的 DAGNF 所代表的任务, 这需要有一个大型的政策网络的高级嵌入式嵌入式矢量, 用于培训的模型。在本文中, 我们首先使用一个定向的直流式直流线内线内线网络网络网络(DGNNNNG) 的内输入并提议一个新的 PipL 版本的精度, IM IM DAG 将自动升级的驱动流程中, 将一个驱动的驱动的流数据流数据流的驱动到一个直流的自动流的驱动到一个直流, 。