Matrix factorization is an important representation learning algorithm, e.g., recommender systems, where a large matrix can be factorized into the product of two low dimensional matrices termed as latent representations. This paper investigates the problem of matrix factorization in distributed computing systems with stragglers, those compute nodes that are slow to return computation results. A computation procedure, called coded Alternative Least Square (ALS), is proposed for mitigating the effect of stragglers in such systems. The coded ALS algorithm iteratively computes two low dimensional latent matrices by solving various linear equations, with the Entangled Polynomial Code (EPC) as a building block. We theoretically characterize the maximum number of stragglers that the algorithm can tolerate (or the recovery threshold) in relation to the redundancy of coding (or the code rate). In addition, we theoretically show the computation complexity for the coded ALS algorithm and conduct numerical experiments to validate our design.
翻译:矩阵因子化是一个重要的代表性学习算法, 例如, 推荐人系统, 可以在其中将一个大矩阵纳入两个低维矩阵的产物中, 称为潜表表。 本文调查了分布式计算系统与分层计算器的矩阵因子化问题, 那些计算节点的计算速度慢于返回计算结果。 提议了一个计算程序, 称为代码替代最小广场( ALS ), 以缓解这些系统中的分层计算器的影响。 编码的 ALS 算法通过解解各种线性方程式, 以“ 聚合聚合码( EPC ) ” ( EPC ) 为构件, 反复计算出两个低维值的潜值矩阵。 我们理论上确定算法能够容忍( 或回收阈值阈值) 与 重复编码( 或代码率) 相关的最大数量 。 此外, 我们理论上显示编码的 ALS 算法的计算复杂性, 并进行数字实验以验证我们的设计 。