We study practical data characteristics underlying federated learning, where non-i.i.d. data from clients have sparse features, and a certain client's local data normally involves only a small part of the full model, called a submodel. Due to data sparsity, the classical federated averaging (FedAvg) algorithm or its variants will be severely slowed down, because when updating the global model, each client's zero update of the full model excluding its submodel is inaccurately aggregated. Therefore, we propose federated submodel averaging (FedSubAvg), ensuring that the expectation of the global update of each model parameter is equal to the average of the local updates of the clients who involve it. We theoretically proved the convergence rate of FedSubAvg by deriving an upper bound under a new metric called the element-wise gradient norm. In particular, this new metric can characterize the convergence of federated optimization over sparse data, while the conventional metric of squared gradient norm used in FedAvg and its variants cannot. We extensively evaluated FedSubAvg over both public and industrial datasets. The evaluation results demonstrate that FedSubAvg significantly outperforms FedAvg and its variants.
翻译:我们研究联谊学习所依据的实际数据特征,即客户提供的非i.d.d.数据具有稀少的特点,而某个客户的本地数据通常只涉及称为子模型的完整模型的一小部分。由于数据宽度,古典联谊平均(FedAvg)算法或其变体将大大减慢,因为当更新全球模型时,每个客户对不包括其子模型的完整模型的零更新都是不准确的汇总。因此,我们提议采用联谊子模型平均值(FedSubAvg),确保每个模型参数全球更新的期望与参与该参数的客户在当地更新的平均值相等。我们理论上证明FedSubAvg的趋同率,在称为元素梯度规范的新指标下得出了一个上限。特别是,这一新指标可以说明联谊优化比稀散数据趋一致,而FedAvg使用的平梯规范常规度则无法精确地加以比较。我们广泛评价了FedSubAvg对公共和工业数据集的当地更新的平均数。我们从理论上证明了FedSubAvg的变式。