Executing workflows on volunteer computing resources where individual tasks may be forced to relinquish their resource for the resource's primary use leads to unpredictability and often significantly increases execution time. Task replication is one approach that can ameliorate this challenge. This comes at the expense of a potentially significant increase in system load and energy consumption. We propose the use of Reinforcement Learning (RL) such that a system may `learn' the `best' number of replicas to run to increase the number of workflows which complete promptly whilst minimising the additional workload on the system when replicas are not beneficial. We show, through simulation, that we can save 34% of the energy consumption using RL compared to a fixed number of replicas with only a 4% decrease in workflows achieving a pre-defined overhead bound.
翻译:执行自愿计算资源的工作流程,因为个别任务可能被迫放弃资源,用于资源的主要用途,从而导致不可预测性,而且往往大大增加执行时间。任务复制是能够缓解这一挑战的一种方法,这样做可能牺牲系统负荷和能源消耗的大幅增加。我们建议使用强化学习系统,这样一个系统可以“清除”“最佳”的复制件,以便增加及时完成的工作流程数量,同时在复制品不有益时将系统的额外工作量减少到最低程度。我们通过模拟表明,我们可以节省34%的能源消耗,而固定数量的复制品则只能减少4%的工作流程,从而实现预先确定的间接费用约束。