Distributed computing, in which a resource-intensive task is divided into subtasks and distributed among different machines, plays a key role in solving large-scale problems. Coded computing is a recently emerging paradigm where redundancy for distributed computing is introduced to alleviate the impact of slow machines (stragglers) on the completion time. We investigate coded computing solutions over elastic resources, where the set of available machines may change in the middle of the computation. This is motivated by recently available services in the cloud computing industry (e.g., EC2 Spot, Azure Batch) where low-priority virtual machines are offered at a fraction of the price of the on-demand instances but can be preempted on short notice. Our contributions are three-fold. We first introduce a new concept called transition waste that quantifies the number of tasks existing machines must abandon or take over when a machine joins/leaves. We then develop an efficient method to minimize the transition waste for the cyclic task allocation scheme recently proposed in the literature (Yang et al. ISIT'19). Finally, we establish a novel solution based on finite geometry achieving zero transition wastes given that the number of active machines varies within a fixed range.
翻译:分布式计算是资源密集型任务分为子任务和在不同机器之间分布的,在解决大规模问题方面发挥着关键作用。 编码计算是最近出现的一种模式,对分布式计算进行冗余,以缓解慢速机(减压器)在完成时间的影响。 我们调查弹性资源的编码计算解决方案,在计算过程中,一套可用的机器可能会发生变化。 这是由最近提供的云计算行业(例如EC2 Spot、Azure Batch)的服务推动的。 云计算行业(例如,EC2 Spot、Azure Batch)以按需价格的一小部分价格提供低优先虚拟机器,但可在短时间内提前完成。 我们的贡献是三重。 我们首先引入了一种称为过渡性废物的新概念,即对现有机器的任务数量进行量化,在机器加入/离开时必须放弃或接管。 然后我们开发一种有效的方法,以最大限度减少循环任务分配计划(Yang等人、Azure Batch)中最近提出的过渡废物(ISIT'19)。 最后,我们根据有限的地理测量方法,在固定的等级范围内实现零转移废物数量。</s>