Large scale machine learning is increasingly relying on distributed optimization, whereby several machines contribute to the training process of a statistical model. In this work we study the performance of asynchronous, distributed settings, when applying sparsification, a technique used to reduce communication overheads. In particular, for the first time in an asynchronous, non-convex setting, we theoretically prove that, in presence of staleness, sparsification does not harm SGD performance: the ergodic convergence rate matches the known result of standard SGD, that is $\mathcal{O} \left( 1/\sqrt{T} \right)$. We also carry out an empirical study to complement our theory, and confirm that the effects of sparsification on the convergence rate are negligible, when compared to 'vanilla' SGD, even in the challenging scenario of an asynchronous, distributed system.
翻译:大规模机器学习日益依赖于分布式优化, 使数台机器为统计模型的培训过程做出贡献。 在这项工作中,我们研究了在应用封闭式时非同步、 分布式设置的性能, 这是一种用来减少通信管理费的技术。 特别是, 在不同步、 非凝固的环境下,我们首次从理论上证明, 在出现腐烂的情况下, 封闭性不会损害 SGD 的性能: ergodic 趋同率与标准 SGD 的已知结果相匹配, 即$\ mathcal{ O}\ fleft ( 1/\\ sqrt{T}\right) 。 我们还进行了一项经验性研究, 以补充我们的理论, 并确认, 与“ Vanilla SGD ” 相比, 重复性对汇合率的影响微不足道, 即使是在一个具有挑战性的分布式系统的情况下。