One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the computation tasks. In this technique, coding is used across data sets, and computation is done over coded data, such that the results of an arbitrary subset of worker nodes with a certain size are enough to recover the final results. The major challenges with those approaches are (1) they are limited to polynomial function computations, (2) the size of the subset of servers that we need to wait for grows with the multiplication of the size of the data set and the model complexity (the degree of the polynomial), which can be prohibitively large, (3) they are not numerically stable for computation over real numbers. In this paper, we propose Berrut Approximated Coded Computing (BACC), as an alternative approach, which is not limited to polynomial function computation. In addition, the master node can approximately calculate the final results, using the outcomes of any arbitrary subset of available worker nodes. The approximation approach is proven to be numerically stable with low computational complexity. In addition, the accuracy of the approximation is established theoretically and verified by simulation results in different settings such as distributed learning problems. In particular, BACC is used to train a deep neural network on a cluster of servers, which outperforms repetitive computation (repetition coding) in terms of the rate of convergence.
翻译:在使用分布式学习来培训具有大数据集的复杂模型方面,主要的挑战之一是处理累进式效应。作为一种解决方案,最近提议编码计算是为了有效地增加计算任务中的冗余。在这种技术中,在数据集之间使用编码,在编码数据方面进行计算,因此对编码数据进行计算,因此,一个具有一定尺寸的任意一组工人节点的结果足以恢复最终结果。这些方法的主要挑战是:(1)它们仅限于多元函数计算,(2)我们需要等待随着数据集规模和模型复杂性(多元体的程度)的倍增而增长的服务器组群的趋同程度(多元体的倍增,多元体的复杂程度)而增长。在这种技术中,编码编码的编码是不同的,在本文件中,我们提议Berrut Ap相近的编码计算器(BACC)作为一种替代方法,它不局限于多元函数计算。此外,主节点能够用任何任意的内嵌式计算结果来计算最终结果,我们需要等待的服务器组群积的趋一致程度。近值方法的精确度方法,在精确度方面,在精确度方面,在精确度上,在精确度的计算中,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度的计算,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上,在精确度上