Decentralized optimization with time-varying networks is an emerging paradigm in machine learning. It saves remarkable communication overhead in large-scale deep training and is more robust in wireless scenarios especially when nodes are moving. Federated learning can also be regarded as decentralized optimization with time-varying communication patterns alternating between global averaging and local updates. While numerous studies exist to clarify its theoretical limits and develop efficient algorithms, it remains unclear what the optimal complexity is for non-convex decentralized stochastic optimization over time-varying networks. The main difficulties lie in how to gauge the effectiveness when transmitting messages between two nodes via time-varying communications, and how to establish the lower bound when the network size is fixed (which is a prerequisite in stochastic optimization). This paper resolves these challenges and establish the first lower bound complexity. We also develop a new decentralized algorithm to nearly attain the lower bound, showing the tightness of the lower bound and the optimality of our algorithm.
翻译:使用时间变换的网络进行分散化优化是机器学习的新兴范例。 它在大型深层培训中节省了显著的通信管理费用,在无线情景中更为强大,特别是在节点移动时。 联邦学习也可以被视为分散化优化,在时间变换式的通信模式中,在全球平均和本地更新之间交替。 虽然存在许多研究来澄清其理论限制并开发高效算法,但对于非中央化分散化的在时间变换的网络优化来说,其最佳复杂性仍然不清楚。 主要困难在于如何测量通过时间变换的通信在两个节点之间传递信息的有效性,以及如何在网络规模固定(这是随机化优化的一个先决条件)时建立较低约束。 本文解决了这些挑战,并确立了第一个更低层次的复杂度。 我们还开发了一个新的分散化算法,以接近较低约束,显示较低约束的紧凑度和我们算法的最佳性。