学习优化多种不同环境的DAG日程安排 (Learning to Optimize DAG Scheduling in Heterogeneous Environment)

Directed Acyclic Graph (DAG) scheduling in a heterogeneous environment is aimed at assigning the on-the-fly jobs to a cluster of heterogeneous computing executors in order to minimize the makespan while meeting all requirements of scheduling. The problem gets more attention than ever since the rapid development of heterogeneous cloud computing. A little reduction of makespan of DAG scheduling could both bring huge profits to the service providers and increase the level of service of users. Although DAG scheduling plays an important role in cloud computing industries, existing solutions still have huge room for improvement, especially in making use of topological dependencies between jobs. In this paper, we propose a task-duplication based learning algorithm, called \textit{Lachesis}, for the distributed DAG scheduling problem. In our approach, it first perceives the topological dependencies between jobs using a specially designed graph convolutional network (GCN) to select the most likely task to be executed. Then the task is assigned to a specific executor with the consideration of duplicating all its precedent tasks according to a sophisticated heuristic method. We have conducted extensive experiments over standard workload data to evaluate our solution. The experimental results suggest that the proposed algorithm can achieve at most 26.7\% reduction of makespan and 35.2\% improvement of speedup ratio over seven strong baseline algorithms, including state-of-the-art heuristics methods and a variety of deep reinforcement learning based algorithms.

翻译：定向环形图(DAG) 在不同环境中的定向环形图(DAG) 列表旨在将现场工作分配给一组混合计算执行者,以便在满足所有排期要求的同时最大限度地减少差幅,从而在满足所有排期要求的同时最大限度地减少差幅。自多种云计算迅速发展以来,问题比以往更加受到关注。DAG的排程如果稍微缩小,既能给服务供应商带来巨大的利润,又能提高用户的服务水平。虽然DAG的排程在云计算行业中发挥着重要作用,但现有的解决方案仍有很大的改进余地,特别是在使用工作之间的地形依赖性方面。在本文中,我们为分布式DAG的排程安排问题提出了一个基于任务重复的学习算法,称为\ textit{Lachesis}。在我们的方法中,它首先认识到使用专门设计的图形革命网络(GCN) 来选择最可能完成的任务之间的工作在结构上的依赖性。随后,任务被指派给一个特定的执行者,考虑按照复杂的超度方法重复其所有先例性任务。我们进行了广泛的基于35级的学习比重比率的计算,我们进行了广泛的实验,在标准级算算算算法上了最大幅度的工作量比重的工作量递减速度数据,包括最大幅度递减压后,从而得出了最大幅度递减后,他进行最重的进度进度的进度的进度的进度分析结果。