Distributed learning has become an integral tool for scaling up machine learning and addressing the growing need for data privacy. Although more robust to the network topology, decentralized learning schemes have not gained the same level of popularity as their centralized counterparts for being less competitive performance-wise. In this work, we attribute this issue to the lack of synchronization among decentralized learning workers, showing both empirically and theoretically that the convergence rate is tied to the synchronization level among the workers. Such motivated, we present a novel decentralized learning framework based on nonlinear gossiping (NGO), that enjoys an appealing finite-time consensus property to achieve better synchronization. We provide a careful analysis of its convergence and discuss its merits for modern distributed optimization applications, such as deep neural networks. Our analysis on how communication delay and randomized chats affect learning further enables the derivation of practical variants that accommodate asynchronous and randomized communications. To validate the effectiveness of our proposal, we benchmark NGO against competing solutions through an extensive set of tests, with encouraging results reported.
翻译:分散的学习计划虽然对网络地形学更为有力,但并没有像集中的学习计划那样,因为业绩竞争力较低而得到与集中的学习计划同等的受欢迎程度。在这项工作中,我们将这一问题归因于分散的学习工作者之间缺乏同步性,从经验上和理论上都表明,趋同率与工人的同步水平挂钩。这种动机是,我们提出了一个基于非线性八卦(NGO)的新颖的分散学习框架,享有有吸引力的定时共识属性,以实现更好的同步。我们仔细分析其趋同性,并讨论其对于现代分布式优化应用的优点,例如深神经网络。我们关于通信延误和随机聊天如何影响学习的分析进一步促成实用变量的衍生,适应无节制和随机通信。为了验证我们提案的有效性,我们用一系列广泛的测试来衡量非政府组织与竞争性的解决办法,并报告令人鼓舞的结果。