部分恢复的渐进编码 (On Gradient Coding with Partial Recovery)

We consider a generalization of the recently proposed gradient coding framework where a large dataset is divided across $n$ workers and each worker transmits to a master node one or more linear combinations of the gradients over the data subsets assigned to it. Unlike the conventional framework which requires the master node to recover the sum of the gradients over all the data subsets in the presence of $s$ straggler workers, we relax the goal of the master node to computing the sum of at least some $\alpha$ fraction of the gradients. The broad goal of our work is to study the optimal computation and communication load per worker for this approximate gradient coding framework. We begin by deriving a lower bound on the computation load of any feasible scheme and also propose a strategy which achieves this lower bound, albeit at the cost of high communication load and a number of data partitions which can be polynomial in the number of workers $n$. We then restrict attention to schemes which utilize a number of data partitions equal to $n$ and propose schemes based on cyclic assignment which have a lower communication load. When each worker transmits a single linear combination, we also prove lower bounds on the computation load of any scheme using $n$ data partitions.

翻译：我们考虑对最近提出的梯度编码框架的概括化,即大型数据集在工人之间分配,每个工人将一个大的数据集分成10美元,并将分配给它的数据子群的梯度的一个或多个线性组合传送到主节点,与要求主节点在工人在场的情况下回收所有数据子群的梯度总和的传统框架不同,我们放宽主节点的目标,即计算梯度中至少一定部分的美元/阿尔法。我们工作的广泛目标是研究每个工人在这个大约的梯度编码框架方面的最佳计算和通信负荷。我们首先从任何可行计划的计算负荷的较低约束开始,并提出一项实现这一较低约束的战略,尽管以高通信负荷和若干数据分区的成本计算,这些成本在工人人数中可能是多数值的美元。我们随后将注意力限制在利用相当于美元的数据分割数的计划上,并提议以较低通信负荷的自行车分配为基础的计划。当每个工人使用单线性组合传输单项数据时,我们也在计算任何单线性组合时,也证明单线性组合的计算较低。