In federated learning, a large number of users are involved in a global learning task, in a collaborative way. They alternate local computations and communication with a distant server. Communication, which can be slow and costly, is the main bottleneck in this setting. To accelerate distributed gradient descent, the popular strategy of local training is to communicate less frequently; that is, to perform several iterations of local computations between the communication steps. A recent breakthrough in this field was made by Mishchenko et al. (2022): their Scaffnew algorithm is the first to probably benefit from local training, with accelerated communication complexity. However, it was an open and challenging question to know whether the powerful mechanism behind Scaffnew would be compatible with partial participation, the desirable feature that not all clients need to participate to every round of the training process. We answer this question positively and propose a new algorithm, which handles local training and partial participation, with state-of-the-art communication complexity.
翻译:在联合学习中,大量用户以协作方式参与全球学习任务,与遥远的服务器互换本地计算和通信。通信可能是缓慢和昂贵的,是这一环境下的主要瓶颈。为了加速分布梯度下降,当地培训的流行战略是减少交流频率;也就是说,在沟通步骤之间对本地计算进行若干次重复。Mishchenko等人(2022年)最近在这一领域取得了突破:他们的Scaffnew算法可能首先受益于本地培训,而且通信复杂性加快。然而,要知道Scaffnew背后的强大机制是否与部分参与兼容,这是一个开放和具有挑战性的问题,因为并非所有客户都需要参加每一轮培训过程。我们积极回答这个问题,并提出新的算法,处理本地培训和部分参与,并具有最先进的通信复杂性。