Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance. It consists of many nodes that coordinate without a central server and exchange millions of parameters in the inherently iterative process of machine learning (ML) training. In addition, these nodes are connected in complex and potentially dynamic topologies. Assessing the intricate dynamics of such networks is clearly not an easy task. Often in literature, researchers resort to simulated environments that do not scale and fail to capture practical and crucial behaviors, including the ones associated to parallelism, data transfer, network delays, and wall-clock time. In this paper, we propose DecentralizePy, a distributed framework for decentralized ML, which allows for the emulation of large-scale learning networks in arbitrary topologies. We demonstrate the capabilities of DecentralizePy by deploying techniques such as sparsification and secure aggregation on top of several topologies, including dynamic networks with more than one thousand nodes.
翻译:分散式学习 (DL) 因其在可扩展性、隐私和容错方面的潜在优势而引起了广泛关注。它由许多节点组成,这些节点在没有中央服务器的情况下协调工作,并在机器学习 (ML) 训练的本质迭代过程中交换数百万参数。此外,这些节点之间连接成复杂且可能动态的拓扑结构。评估这种网络的复杂动力学显然不是一件容易的事情。通常在文献中,研究人员会使用仿真环境,但这些仿真环境缺乏实用和重要的行为模式,并且不能进行规模扩展,包括与并行处理、数据传输、网络延迟和墙钟时间相关的行为。在本文中,我们提出 DecentralizePy,这是一个分布式的机制学习框架,它允许在任意拓扑结构下模拟大规模的学习网络。借助于 DecentralizePy,我们展示了稀疏化和安全聚合等技术在数千个节点的动态网络上运行的能力。