Traditionally, distributed machine learning takes the guise of (i) different nodes training the same model (as in federated learning), or (ii) one model being split among multiple nodes (as in distributed stochastic gradient descent). In this work, we highlight how fog- and IoT-based scenarios often require combining both approaches, and we present a framework for flexible parallel learning (FPL), achieving both data and model parallelism. Further, we investigate how different ways of distributing and parallelizing learning tasks across the participating nodes result in different computation, communication, and energy costs. Our experiments, carried out using state-of-the-art deep-network architectures and large-scale datasets, confirm that FPL allows for an excellent trade-off among computational (hence energy) cost, communication overhead, and learning performance.
翻译:从传统上看,分散式机器学习的幌子是:(一)不同节点培训相同模式(如联合学习),或(二)一个模式在多个节点之间(如分布式随机梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯。 在这项工作中,我们强调雾基情景和木卫一情景往往需要结合两种方法,我们提出了一个灵活的平行学习框架(FPL ), 实现数据和模型平行。此外,我们调查不同方式在参与的节点之间分配和平行学习任务如何导致不同的计算、通信和能源成本。我们利用最先进的深网络架构和大规模数据集进行的实验证实,FPL允许计算(生物能源)成本、通信间接费用和学习绩效之间的极佳的权衡。