Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data. However, training is resource-intensive for edge devices, and limited network bandwidth is often the main bottleneck. Prior work often overcomes the constraints by condensing the models or messages into compact formats, e.g., by gradient compression or distillation. In contrast, we propose ProgFed, the first progressive training framework for efficient and effective federated learning. It inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models. We theoretically prove that ProgFed converges at the same asymptotic rate as standard training on full models. Extensive results on a broad range of architectures, including CNNs (VGG, ResNet, ConvNets) and U-nets, and diverse tasks from simple classification to medical image segmentation show that our highly effective training approach saves up to $20\%$ computation and up to $63\%$ communication costs for converged models. As our approach is also complimentary to prior work on compression, we can achieve a wide range of trade-offs by combining these techniques, showing reduced communication of up to $50\times$ at only $0.1\%$ loss in utility. Code is available at https://github.com/a514514772/ProgFed.
翻译:联邦学习是一种强大的分布式学习计划,它允许许多边缘设备在不分享数据的情况下合作培训模型,然而,培训是边缘设备的资源密集型,而有限的网络带宽往往是主要的瓶颈。先前的工作往往克服了制约因素,将模型或信息凝结成紧凑格式,例如梯度压缩或蒸馏等,从而克服了各种制约因素。相比之下,我们提议采用ProgFed,这是第一个高效和有效的联合学习的渐进式培训框架。它必然会减少计算和双向通信费用,同时保持最终模型的强力性能。我们理论上证明,ProgFed与全模式的标准培训一样,以同样的零食率趋同。在广泛的结构方面,包括CNN(VG、ResNet、ConvNets)和U-net等,以及从简单分类到医学图像分割的不同任务,都表明我们非常有效的培训方法可以节省高达20美元的计算和高达63美元的通信费用。由于我们的方法与先前的压缩工作相辅相成,我们只能通过这些技术在50美分数/美元之间实现广泛的贸易成本。