Federated learning (FL) is typically performed in a synchronous parallel manner, where the involvement of a slow client delays a training iteration. Current FL systems employ a participant selection strategy to select fast clients with quality data in each iteration. However, this is not always possible in practice, and the selection strategy often has to navigate an unpleasant trade-off between the speed and the data quality of clients. In this paper, we present Pisces, an asynchronous FL system with intelligent participant selection and model aggregation for accelerated training. To avoid incurring excessive resource cost and stale training computation, Pisces uses a novel scoring mechanism to identify suitable clients to participate in a training iteration. It also adapts the pace of model aggregation to dynamically bound the progress gap between the selected clients and the server, with a provable convergence guarantee in a smooth non-convex setting. We have implemented Pisces in an open-source FL platform called Plato, and evaluated its performance in large-scale experiments with popular vision and language models. Pisces outperforms the state-of-the-art synchronous and asynchronous schemes, accelerating the time-to-accuracy by up to 2.0x and 1.9x, respectively.
翻译:联邦学习(FL)通常以同步并行的方式进行,缓慢的客户参与会推迟培训循环。当前的FL系统采用参与者选择战略选择具有每循环中高质量数据的快速客户,但在实践中并不总能做到这一点,选择战略往往要在客户速度和数据质量之间做出不愉快的权衡。在本文中,我们介绍Pisces,这是一个非同步的FL系统,有智能参与者选择和加速培训的模型组合。为了避免产生过高的资源成本和过时的培训计算,Pisces使用新颖的评分机制确定适合参加培训迭代的客户。它还调整了模型汇总速度,以动态地将选定客户与服务器之间的进度差距捆绑起来,在平滑的不凝固环境中提供可变的趋同保证。我们在一个名为Plato的开放源FL平台上实施了Pisces,并评价了其在大规模实验中与流行的视觉和语言模型的绩效。Pisceserps 超越了州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州--州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-