Scientific workflows typically comprise a multitude of different processing steps which often are executed in parallel on different partitions of the input data. These executions, in turn, must be scheduled on the compute nodes of the computational infrastructure at hand. This assignment is complicated by the facts that (a) tasks typically have highly heterogeneous resource requirements and (b) in many infrastructures, compute nodes offer highly heterogeneous resources. In consequence, predictions of the runtime of a given task on a given node, as required by many scheduling algorithms, are often rather imprecise, which can lead to sub-optimal scheduling decisions. We propose Reshi, a method for recommending task-node assignments during workflow execution that can cope with heterogeneous tasks and heterogeneous nodes. Reshi approaches the problem as a regression task, where task-node pairs are modeled as feature vectors over the results of dedicated micro benchmarks and past task executions. Based on these features, Reshi trains a regression tree model to rank and recommend nodes for each ready-to-run task, which can be used as input to a scheduler. For our evaluation, we benchmarked 27 AWS machine types using three representative workflows. We compare Reshi's recommendations with three state-of-the-art schedulers. Our evaluation shows that Reshi outperforms HEFT by a mean makespan reduction of 7.18% and 18.01% assuming a mean task runtime prediction error of 15%.
翻译:科学工作流程通常包含许多不同的处理步骤,这些步骤往往在输入数据的不同分区上平行执行。这些处决必须安排在手头计算基础设施的计算节点上。由于以下事实,这种任务变得复杂:(a)任务通常具有高度多样化的资源要求,以及(b)在许多基础设施中,计算节点提供了高度多样性的资源。因此,对特定节点上特定任务的运行时间的预测,如许多调度算法所要求的,往往相当不精确,这可能导致次优化的时间安排决定。我们提议了Reshi,这是在工作流程执行期间推荐任务节点分配的方法,可以应对不同的任务和交错节点。Reshi将问题当作回归任务处理,任务节点配对成模型,在专门微观基准和以往任务执行的结果上提供高度多样性的资源。因此,根据这些特点,Reshi按照许多调度算法的要求,对一个回归树模型进行排序,并为每个准备执行的任务建议节点,这可能会导致次的最佳时间安排决定。我们提议了Reshi,在工作流程执行期间建议任务节点执行中建议任务节点-节点分配任务。我们为27-WSFA机的错误预测类型,采用有代表性的进度选择的进度选择的进度,我们用三个代表的进度表显示的进度表,比Re-Reformamas