Distributed computation is a framework used to break down a complex computational task into smaller tasks and distributing them among computational nodes. Erasure correction codes have recently been introduced and have become a popular workaround to the well known ``straggling nodes'' problem, in particular, by matching linear coding for linear computation tasks. It was observed that decoding tends to amplify the computation ``noise'', i.e., the numerical errors at the computation nodes. We propose taking advantage of the case that more nodes return than minimally required. We show how a clever construction of a polynomial code, inspired by recent results on robust frames, can significantly reduce the amplification of noise, and achieves graceful degradation with the number of straggler nodes.
翻译:分布式计算是一个框架,用来将复杂的计算任务分为较小的任务,并在计算节点之间分配。最近引入了错误校正代码,并成为了解决众所周知的“交错节点”问题的流行办法,尤其是通过对线性计算任务进行线性编码的匹配。发现解码往往会扩大计算“noise”,即计算节点的数字错误。我们提议利用更多的节点返回超过最起码需要的情况。我们展示了如何在强框架最近结果的启发下,巧妙地构建一个复合代码,能够显著减少噪音的放大,并实现与分层点数相容的优减。