In this chapter, we will mainly focus on collaborative training across wireless devices. Training a ML model is equivalent to solving an optimization problem, and many distributed optimization algorithms have been developed over the last decades. These distributed ML algorithms provide data locality; that is, a joint model can be trained collaboratively while the data available at each participating device remains local. This addresses, to some extend, the privacy concern. They also provide computational scalability as they allow exploiting computational resources distributed across many edge devices. However, in practice, this does not directly lead to a linear gain in the overall learning speed with the number of devices. This is partly due to the communication bottleneck limiting the overall computation speed. Additionally, wireless devices are highly heterogeneous in their computational capabilities, and both their computation speed and communication rate can be highly time-varying due to physical factors. Therefore, distributed learning algorithms, particularly those to be implemented at the wireless network edge, must be carefully designed taking into account the impact of time-varying communication network as well as the heterogeneous and stochastic computation capabilities of devices.
翻译:在本章中,我们将主要侧重于跨无线装置的合作培训。培训ML模型相当于解决优化问题,在过去几十年中已经开发了许多分布式优化算法。这些分布式ML算法提供了数据位置;也就是说,可以合作培训联合模型,而每个参与设备提供的数据仍为本地数据。这涉及到隐私问题,还提供计算性可缩放性,因为这样可以利用分布在许多边缘装置的计算资源。然而,在实践中,这并没有直接导致与装置数量相比的总体学习速度的直线增益。部分原因是通信瓶颈限制了整体计算速度。此外,无线装置的计算能力高度不一,其计算速度和通信速度因物理因素而变化得非常大。因此,在设计分布式学习算法时,特别是将在无线网络边缘执行的计算算法时,必须考虑到时间变化通信网络的影响以及装置的混和随机计算能力。