In recent years, coded distributed computing (CDC) has attracted significant attention, because it can efficiently facilitate many delay-sensitive computation tasks against unexpected latencies in distributed computing systems. Despite such a salient feature, many design challenges and opportunities remain. In this paper, we focus on practical computing systems with heterogeneous computing resources, and design a novel CDC approach, called batch-processing based coded computing (BPCC), which exploits the fact that every computing node can obtain some coded results before it completes the whole task. To this end, we first describe the main idea of the BPCC framework, and then formulate an optimization problem for BPCC to minimize the task completion time by configuring the computation load. Through formal theoretical analyses, extensive simulation studies, and comprehensive real experiments on the Amazon EC2 computing clusters, we demonstrate promising performance of the proposed BPCC scheme, in terms of high computational efficiency and robustness to uncertain disturbances.
翻译:近年来,编码的分布式计算(CDC)吸引了人们的极大关注,因为它能够有效地促进针对分布式计算系统中意外的迟滞进行许多延迟敏感计算任务。尽管存在这样的突出特点,但许多设计挑战和机遇依然存在。在本文件中,我们侧重于具有多种计算资源的实用计算系统,并设计了一种新的CDC方法,称为批量处理编码计算(BPCC),它利用了每个计算节点在完成全部任务之前都能获得一些编码结果这一事实。为此,我们首先描述了BPCC框架的主要理念,然后为BPCC设计了一个优化问题,通过配置计算负荷来尽量减少任务完成时间。通过正式的理论分析、广泛的模拟研究和对亚马逊EC2计算组的全面实际实验,我们展示了拟议的BPCC计划在高计算效率和稳健度以适应不确定的扰动方面大有希望的表现。