BlueFoog: 使分散化的数值实用化以优化和深层学习 (BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning)

Decentralized algorithm is a form of computation that achieves a global goal through local dynamics that relies on low-cost communication between directly-connected agents. On large-scale optimization tasks involving distributed datasets, decentralized algorithms have shown strong, sometimes superior, performance over distributed algorithms with a central node. Recently, developing decentralized algorithms for deep learning has attracted great attention. They are considered as low-communication-overhead alternatives to those using a parameter server or the Ring-Allreduce protocol. However, the lack of an easy-to-use and efficient software package has kept most decentralized algorithms merely on paper. To fill the gap, we introduce BlueFog, a python library for straightforward, high-performance implementations of diverse decentralized algorithms. Based on a unified abstraction of various communication operations, BlueFog offers intuitive interfaces to implement a spectrum of decentralized algorithms, from those using a static, undirected graph for synchronous operations to those using dynamic and directed graphs for asynchronous operations. BlueFog also adopts several system-level acceleration techniques to further optimize the performance on the deep learning tasks. On mainstream DNN training tasks, BlueFog reaches a much higher throughput and achieves an overall $1.2\times \sim 1.8\times$ speedup over Horovod, a state-of-the-art distributed deep learning package based on Ring-Allreduce. BlueFog is open source at https://github.com/Bluefog-Lib/bluefog.

翻译：分散式算法是一种通过本地动态实现全球目标的计算形式,它依赖于直接连接的代理商之间的低成本通信。在涉及分布式数据集的大规模优化任务中,分散式算法显示,与中央节点的分布式算法相比,其性能强,有时是优异的。最近,为深层学习开发分散式算法吸引了极大关注。它们被视为使用参数服务器或环图协议的低通信-高端替代算法。然而,由于缺少一个容易使用的高效软件包,大多数分散式算法仅停留在纸上。为了填补空白,我们引入了BlueFog,这是一个用于直接、高性能地实施不同分散式算法库。基于各种通信业务的统一抽象,Bluefog 提供直观界面,以实施一系列分散式算法,从静态、无源图用于同步操作,到使用动态和定向图表进行无源源/直线图操作的人。BlueFog 也采用若干系统级加速技术,以进一步优化在深度学习任务中进行精度的精度Leal-hlimal-himal-hestallivolsmaxxx