We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into k smaller tasks, encoded using an (n,k) linear code, and performed over n distributed nodes. The goal is to reduce the average execution time of the computational job. We provide a connection between the problem of characterizing the average execution time of a coded distributed computing system and the problem of analyzing the error probability of codes of length n used over erasure channels. Accordingly, we present closed-form expressions for the execution time using binary random linear codes and the best execution time any linear-coded distributed computing system can achieve. It is also shown that there exist good binary linear codes that not only attain (asymptotically) the best performance that any linear code (not necessarily binary) can achieve but also are numerically stable against the inevitable rounding errors in practice. We then develop a low-complexity algorithm for decoding Reed-Muller (RM) codes over erasure channels. Our decoder only involves additions and subtractions and enables coded computation over real-valued data. Extensive numerical analysis of the fundamental results as well as RM- and polar-coded computing schemes demonstrate the excellence of the RM-coded computation in achieving close-to-optimal performance while having a low-complexity decoding and explicit construction. The proposed framework in this paper enables efficient designs of distributed computing systems given the rich literature in the channel coding theory.
翻译:我们考虑编码分布式计算的问题,因为大量线性计算任务,如矩阵倍增,被分为大线性计算任务,分为小千个任务,使用(n,k)线性代码编码,在n分布式节点上执行。目标是减少计算工作的平均执行时间。我们考虑编码分布式计算系统平均执行时间的特性问题,与在删除频道上使用的长度编码的误差概率分析问题之间的关联。因此,我们用二进制随机线性代码和任何线性编码分布式计算系统能够达到的最佳执行时间来显示执行时间的封闭式表达方式。还表明,存在好的双向线性线性代码,不仅能够(暂时)实现任何线性计算工作的平均执行时间,而且在数字上与不可避免的圆性错误相一致。然后,我们开发一种低兼容性算法,用于分解的Reed-Muller(RMM)编码编码,用来计算任何线性编码分布式分布式分布式分布式计算系统能够达到的最佳执行时间。还表明,不仅可以(暂时)任何线性线性线性线性线性线性代码线性代码,而且不仅能够达到任何线性计算任何线性计算任何线性计算任何线性设计(不一定),而且还进行精确的计算结果的计算,而且还能的计算,而且还能的精确地计算结果的精确地分析。