粗简-粗略:与非IID数据集的定向和时间变化图相比的通信和能源高效分散分布式分散化学习 (Sparse-Push: Communication- & Energy-Efficient Decentralized Distributed Learning over Directed & Time-Varying Graphs with non-IID Datasets)

Current deep learning (DL) systems rely on a centralized computing paradigm which limits the amount of available training data, increases system latency, and adds privacy and security constraints. On-device learning, enabled by decentralized and distributed training of DL models over peer-to-peer wirelessly connected edge devices, not only alleviate the above limitations but also enable next-gen applications that need DL models to continuously interact and learn from their environment. However, this necessitates the development of novel training algorithms that train DL models over time-varying and directed peer-to-peer graph structures while minimizing the amount of communication between the devices and also being resilient to non-IID data distributions. In this work we propose, Sparse-Push, a communication efficient decentralized distributed training algorithm that supports training over peer-to-peer, directed, and time-varying graph topologies. The proposed algorithm enables 466x reduction in communication with only 1% degradation in performance when training various DL models such as ResNet-20 and VGG11 over the CIFAR-10 dataset. Further, we demonstrate how communication compression can lead to significant performance degradation in-case of non-IID datasets, and propose Skew-Compensated Sparse Push algorithm that recovers this performance drop while maintaining similar levels of communication compression.

翻译：目前深层学习(DL)系统依赖于一种中央化的计算模式,这种模式限制现有培训数据的数量,增加系统的延缓性,并增加隐私和安全限制。在线学习,通过对等对等无线连接边缘设备对DL模型进行分散和分散的培训,不仅减轻上述限制,而且使需要DL模型的下一代应用能够不断互动并从环境中学习。然而,这需要开发新的培训算法,在时间变化和引导对等平方图结构中培训DL模型,同时最大限度地减少设备之间的通信量,同时适应非IID数据分布。在这项工作中,我们提议,Sprass-Push,一种高效的分散式传播培训算法,支持对等对等对等方培训、定向和时间变化的图形表层。提议的算法使得在培训各种DL模型,如ResNet-20和VGG11在CIFAR-10数据集中只减少1%的通信性能退化。此外,我们证明通信压缩如何导致显著的性能退化,同时提议SK-II系统-Sqrassimal 的性平流数据级恢复。