Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime.
翻译:通信效率被广泛公认为是多试剂或联合环境中大规模分散式机器学习应用的瓶颈。为了解决通信瓶颈问题,已作出许多努力,设计用于分散式非convex优化的通信压缩算法,客户只能通过预先界定的图形表层与其邻居交流少量量化信息(kabits),尽管作出了重大努力,但非convex环境的最先进的算法仍然由于与未压缩的对应方(O((G/T) ⁇ 2/3})美元相比的趋同速度较慢而受到影响,后者用G$衡量不同客户的数据异质性,而$T$是通信回合的数量。本文提议BEER采用梯度跟踪的通信压缩,显示其汇合速度更快,为$O(1/T)美元。通过不在任意性数据高本性下压缩的速率,这大大改进了非cast-conference。提供Nuceralalex实验还证实了我们的理论,并确认了BEER系统中的实际超级性数据。