We consider the problem of evaluating arbitrary multivariate polynomials over a massive dataset, in a distributed computing system with a master node and multiple worker nodes. Generalized Lagrange Coded Computing (GLCC) codes are proposed to provide robustness against stragglers who do not return computation results in time, adversarial workers who deliberately modify results for their benefit, and information-theoretic security of the dataset amidst possible collusion of workers. GLCC codes are constructed by first partitioning the dataset into multiple groups, and then encoding the dataset using carefully designed interpolation polynomials, such that interference computation results across groups can be eliminated at the master. Particularly, GLCC codes include the state-of-the-art Lagrange Coded Computing (LCC) codes as a special case, and achieve a more flexible tradeoff between communication and computation overheads in optimizing system efficiency.
翻译:我们考虑在分布式计算机系统中对大型数据集进行任意的多变量多元值评估的问题,该系统有一个主节点和多个工人节点。 提议通用的拉格朗编码计算码码(GLCC)代码是为了对不及时返回计算结果的分解器、有意为自身利益修改结果的敌对工人和在工人可能串通的情况下对数据集进行信息理论安全性评估。 GLCC代码是通过首先将数据集分成多个组来构建的,然后用精心设计的内插多元点来编码数据集,这样就可以在主数中消除跨组的干扰计算结果。 特别是,GLCC代码将最先进的拉格朗编码编码(LCC)代码作为一个特例,并在优化系统效率时在通信和计算间接费用之间实现更灵活的交换。