Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. In this paper, to mitigate stragglers and improve communication efficiency, a novel local SGD strategy, named STSyn, is developed. The key point is to wait for the $K$ fastest workers, while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. An analysis of the average wall-clock time, average number of local updates and average number of uploading workers per round is provided to gauge the performance of STSyn. The convergence of STSyn is also rigorously established even when the objective function is nonconvex. Experimental results show the superiority of the proposed STSyn against state-of-the-art schemes through utilization of the straggler-tolerant technique and additional effective local updates at each worker, and the influence of system parameters is studied. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.
翻译:由于工人等待完成同样数量的当地更新,因此有些工人闲置和随机拖延,因为工人在等待着工人完成同样数量的当地更新。在本文中,为了减少累赘者,提高通信效率,制定了新的本地SGD战略,名为STSyn。关键是等待最快的工人,同时让所有工人在每轮同步时继续计算,并充分利用每个工人的有效(完成)本地更新,而不论工人是分流者。对平均墙时钟的分析、当地更新的平均数量以及每轮上载工人的平均数量,以衡量STSyn的业绩。即使目标功能不是Convex,STSyn的趋同也是严格确立的。实验结果显示,拟议的STSyn的优势在于通过利用strggler耐受技术和对每个工人的更多有效本地更新,并研究系统参数的影响。通过等待更快的工人更新和允许不同行业的工人与不同行业的同步,SSTS在时间上提供与不同行业的同步。