We consider a many-to-one wireless architecture for federated learning at the network edge, where multiple edge devices collaboratively train a model using local data. The unreliable nature of wireless connectivity, together with constraints in computing resources at edge devices, dictates that the local updates at edge devices should be carefully crafted and compressed to match the wireless communication resources available and should work in concert with the receiver. Thus motivated, we propose SGD-based bandlimited coordinate descent algorithms for such settings. Specifically, for the wireless edge employing over-the-air computing, a common subset of k-coordinates of the gradient updates across edge devices are selected by the receiver in each iteration, and then transmitted simultaneously over k sub-carriers, each experiencing time-varying channel conditions. We characterize the impact of communication error and compression, in terms of the resulting gradient bias and mean squared error, on the convergence of the proposed algorithms. We then study learning-driven communication error minimization via joint optimization of power allocation and learning rates. Our findings reveal that optimal power allocation across different sub-carriers should take into account both the gradient values and channel conditions, thus generalizing the widely used water-filling policy. We also develop sub-optimal distributed solutions amenable to implementation.
翻译:我们考虑在网络边缘建立一个许多到一个无线的联盟学习架构,多边装置在网络边缘合作培训一个模型,使用当地数据。无线连接的不可靠性质,加上边端设备计算资源方面的限制,要求边端装置的本地更新应仔细制作和压缩,以匹配现有的无线通信资源,并与接收者协同工作。因此,我们提出基于SGD的带宽带宽带宽协调这种环境的下行算法。具体地说,对于使用超空计算法的无线边缘而言,每个迭代的接收器选择了跨边端装置梯度更新的公基坐标,然后同时通过 k 分包机传送,每个设备都面临时间变化的频道条件。我们从由此产生的梯度偏差和平均平方差的角度来描述通信错误的影响,然后研究如何通过联合优化权力分配和学习率来尽量减少由学习驱动的通信错误。我们的调查结果显示,不同子容器之间最佳的电源配置应既考虑到梯度值,又同时在 k 子容器上同时传输,然后同时传送,每个设备,每个设备都经历时间变化变化的频道条件,每个频道条件,每个频道条件。我们界定了通信错误的通信错误和压缩错误的影响,因此也广泛使用了通信错误和压缩政策,从而普遍应用了通信错误的影响。