We consider gradient coding in the presence of an adversary, controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the inputs of the malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partial gradient is computed by. In this work, we reduce replication by proposing a method that detects the erroneous inputs from the malicious workers, hence transforming them into erasures. For $s$ malicious workers, our solution can reduce the replication to $s+1$ instead of $2s+1$ for each partial gradient at the expense of only $s$ additional computations at the main node and additional rounds of light communication between the main node and the workers. We give fundamental limits of the general framework for fractional repetition data allocation. Our scheme is optimal in terms of replication and local computation but incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound.
翻译:我们考虑在存在对计算进行损坏的所谓恶意工人的情况下进行梯度编码。之前的工作建议使用 MDS 编码将恶意工人的输入视为错误并使用代码的误差纠正属性加以纠正。这是以增加复制(每个部分梯度计算的工人数量)为代价的。在这项工作中,我们通过提出一种方法来减少复制,该方法检测恶意工人的错误输入,将其转化为擦除。对于 $s$ 个恶意工人,我们的解决方案将每个部分梯度的复制减少到 $s+1$,而不是 $2s+1$,代价是在主节点和工作节点之间进行轻量级通信的 $s$ 次额外计算和轮次。我们给出了分数重复数据分配的一般框架的基本极限。我们的方案在复制和本地计算方面是最优的,但会产生通信成本,该成本在数据集的大小方面是一个渐近乘性因子,与导出的上限相差不远。