We consider a generalization of the gradient coding framework where a dataset is divided across $n$ workers and each worker transmits to a master node one or more linear combinations of the gradients over its assigned data subsets. Unlike the conventional framework which requires the master node to recover the sum of the gradients over all the data subsets in the presence of straggler workers, we relax the goal to computing the sum of at least some $\alpha$ fraction of the gradients. We begin by deriving a lower bound on the computation load of any scheme and also propose two strategies which achieve this lower bound, albeit at the cost of high communication load and a number of data partitions which can be polynomial in $n$. We then propose schemes based on cyclic assignment which utilize $n$ data partitions and have a lower communication load. When each worker transmits a single linear combination, we prove lower bounds on the computation load of any scheme using $n$ data partitions. Finally, we describe a class of schemes which achieve different intermediate operating points for the computation and communication load and provide simulation results to demonstrate the empirical performance of our schemes.
翻译:我们考虑梯度编码框架的概括化,即将数据集分成10美元工人,每个工人将其分配给指定数据子集的梯度的一个或多个线性组合传送到总节点,而传统的框架则要求主节点回收所有数据子组的梯度总和,在斜度工人在场的情况下,我们放宽计算至少一定比例的梯度的总和的目标;我们开始对任何办法的计算负荷设定一个较低的约束,并提议实现这一较低约束的两个战略,尽管其成本是通信负荷高和若干数据分区,这种费用可以以美元计为多元值。我们然后根据环球分配计划提出计划,利用美元数据分区和较低的通信负荷。当每个工人传输单一线性组合时,我们证明在计算任何办法的计算负荷时,使用美元数据间隔的界限较低。最后,我们描述了在计算和通信负荷方面达到不同中间操作点的一类计划,并提供模拟结果,以证明我们计划的业绩。