A major bottleneck of distributed learning under parameter-server (PS) framework is communication cost due to frequent bidirectional transmissions between the PS and workers. To address this issue, local stochastic gradient descent (SGD) and worker selection have been exploited by reducing the communication frequency and the number of participating workers at each round, respectively. However, partial participation can be detrimental to convergence rate, especially for heterogeneous local datasets. In this paper, to improve communication efficiency and speed up the training process, we develop a novel worker selection strategy named AgeSel. The key enabler of AgeSel is utilization of the ages of workers to balance their participation frequencies. The convergence of local SGD with the proposed age-based partial worker participation is rigorously established. Simulation results demonstrate that the proposed AgeSel strategy can significantly reduce the number of training rounds needed to achieve a targeted accuracy, as well as the communication cost. The influence of the algorithm hyper-parameter is also explored to manifest the benefit of age-based worker selection.
翻译:在参数-服务器(PS)框架下,分布式学习的一个主要瓶颈是通信成本,原因是PS和工人之间经常双向传播。为解决这一问题,通过减少通信频率和参加每一轮的工人人数,利用了当地随机梯度下降和工人选择。然而,部分参与可能不利于趋同率,特别是不同地方数据集的趋同率。在本文件中,为提高通信效率和加快培训进程,我们制定了名为AgeSel的新工人选择战略。AgeSel的关键推动因素是利用工人年龄平衡其参与频率。严格建立了当地SGD与拟议的基于年龄的部分工人参与的趋同。模拟结果表明,拟议的AgeSel战略可以大大减少达到目标准确性所需的培训回合数量,以及通信成本。还探讨了算法超参数的影响,以显示基于年龄的工人选择的好处。