用于高效分布分布式矩阵乘法的双变量多多边编码 (Bivariate Polynomial Coding for Efficient Distributed Matrix Multiplication)

Coded computing is an effective technique to mitigate "stragglers" in large-scale and distributed matrix multiplication. In particular, univariate polynomial codes have been shown to be effective in straggler mitigation by making the task completion time to depend only on the fastest workers. However, these schemes completely ignore the work done by the straggling workers resulting in a waste of computational resources. To reduce the amount of work left unfinished at workers, one can further decompose the matrix multiplication task into smaller sub-tasks, and assign multiple sub-tasks to each worker, possibly heterogeneously, to better fit their particular storage and computation capacities. In this work, we propose bivariate polynomial codes to efficiently exploit the work carried out by straggling workers. We show that bivariate polynomial codes bring significant advantages in terms of upload communication costs and storage efficiency, measured in terms of number of sub-tasks that can be computed per worker. We propose two bivariate polynomial coding schemes. The first one exploits the fact that bivariate interpolation is always possible on a rectangular grid of evaluation points. We obtain such points at the cost of adding some redundant computations. For the second scheme, we relax the decoding constraints, and require decodability for almost all choices of the evaluation points. We present interpolation sets satisfying the such decodability conditions for certain storage configurations of workers. Our numerical results show that bivariate polynomial coding considerably reduces the completion time of distributed matrix multiplication. We believe that this work opens up a new class of previously unexplored coding schemes for efficient coded distributed computation.

翻译：代码化计算是减少大规模分布式矩阵乘法中“ 累进器” 的有效方法。特别是, 单向缩进器多式代码通过使任务完成时间只取决于最快的工人, 证明在减少累进器减缩中有效。但是, 这些计划完全忽视了被挤入的工人所做的工作, 导致计算资源的浪费。为了减少工人未完成的工作量, 人们可以进一步将矩阵递增任务分解成较小的子任务, 并给每个工人分配多个子任务, 可能各异, 以更好地适应其特定的存储和计算能力。在这项工作中, 我们提议双向组合多式混合代码, 以高效地利用这些混合代码, 以高效地利用工人完成的工作。我们显示, 双向混合混合的计算法在上带来巨大的优势, 以每个工人可计算子任务的数量来衡量。我们建议两个双向多级的多级化变量化计算方案。首先利用一个事实, 双向的存储器间配置和计算结果, 我们总是需要一种稳定的计算方法, 重新计算一个稳定的计算方法, 。我们总是需要一种稳定的计算一个稳定的计算系统。的计算方法, 。我们的递进进式的计算方法, 需要一个稳定的计算方法, 。我们的计算一个稳定的计算方法, 我们的计算方法, 需要一个稳定的计算系统的计算的计算一个稳定的计算的计算方法, 。