Federated learning (FL) enables geographically dispersed edge devices (i.e., clients) to learn a global model without sharing the local datasets, where each client performs gradient descent with its local data and uploads the gradients to a central server to update the global model. However, FL faces massive communication overhead resulted from uploading the gradients in each training round. To address this problem, most existing research compresses the gradients with fixed and unified quantization for all the clients, which neither seeks adaptive quantization due to the varying gradient norms at different rounds, nor exploits the heterogeneity of the clients to accelerate FL. In this paper, we propose a novel adaptive and heterogeneous gradient quantization algorithm (AdaGQ) for FL to minimize the wall-clock training time from two aspects: i) adaptive quantization which exploits the change of gradient norm to adjust the quantization resolution in each training round; and ii) heterogeneous quantization which assigns lower quantization resolution to slow clients to align their training time with other clients to mitigate the communication bottleneck, and higher quantization resolution to fast clients to achieve a better communication efficiency and accuracy tradeoff. Evaluations based on various models and datasets validate the benefits of AdaGQ, reducing the total training time by up to 52.1% compared to baseline algorithms (e.g., FedAvg, QSGD).
翻译:联邦学习(FL)使地理上分散的边缘设备(即客户)能够学习一个全球模型,而不必分享当地数据集,让每个客户使用当地数据进行梯度下降,并将梯度上传到中央服务器,以更新全球模型。然而,FL面临在每轮培训中上传梯度导致的大量通信间接费用。为解决这一问题,大多数现有研究将梯度压缩成固定和统一的四分制,因为不同回合的梯度规范不同,既不能寻求适应性四分制,也不能利用客户的异质性加速FL。 在本文件中,我们提议为FL提出一个新的适应性和异性梯度梯度四分化算法(AdaGQQ),以便从两个方面尽量减少倒数小时的培训时间:一) 适应性四分化,利用梯度标准的变化来调整每轮培训的四分解分辨率;二) 分异性四分化分解,既让较低的客户缩短培训时间与其他客户调整培训时间,以缓解通信瓶颈,与52调高度梯度四分化的梯度计算法,比较52位梯度梯度平平平的计算,使快速客户达到快速贸易的精确度,使数据实现。