Coded computing is an effective technique to mitigate "stragglers" in large-scale and distributed matrix multiplication. In particular, univariate polynomial codes have been shown to be effective in straggler mitigation by making the computation time depend only on the fastest workers. However, these schemes completely ignore the work done by the straggling workers resulting in a waste of computational resources. To reduce the amount of work left unfinished at workers, one can further decompose the matrix multiplication task into smaller sub-tasks, and assign multiple sub-tasks to each worker, possibly heterogeneously, to better fit their particular storage and computation capacities. In this work, we propose a novel family of bivariate polynomial codes to efficiently exploit the work carried out by straggling workers. We show that bivariate polynomial codes bring significant advantages in terms of upload communication costs and storage efficiency, measured in terms of the number of sub-tasks that can be computed per worker. We propose two bivariate polynomial coding schemes. The first one exploits the fact that bivariate interpolation is always possible on a rectangular grid of evaluation points. We obtain such points at the cost of adding some redundant computations. For the second scheme, we relax the decoding constraints and require decodability for almost all choices of the evaluation points. We present interpolation sets satisfying such decodability conditions for certain storage configurations of workers. Our numerical results show that bivariate polynomial coding considerably reduces the average computation time of distributed matrix multiplication. We believe this work opens up a new class of previously unexplored coding schemes for efficient coded distributed computation.
翻译:代码化计算是减少大规模分布式矩阵乘数中“ 累进器” 的有效方法。 特别是, 单等离子多式代码通过使计算时间仅取决于最快的工人而证明在减少累进器减少累进器方面是有效的。 但是, 这些计划完全忽视了被拉入的工人所做的工作, 从而导致计算资源的浪费。 为了减少工人未完成的工作量, 人们可以进一步将矩阵乘数任务分解成较小的子任务, 并给每个工人分配多个子任务, 可能各异, 以更好地适应其特定的存储和计算能力。 在这项工作中, 我们提议建立一个新型双等多式多式代码组, 以高效的存储和计算规则来有效减少累进 。 我们提出一个新型的双轨化多式多式存储和计算方法 。 我们的计算方法中的某些不易变化的计算结果, 之前的计算方法会大大降低成本化 。 我们的计算方法会持续地显示, 我们的计算成本变化的计算方法 。