We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optimization algorithm for communication-efficient training of supervised machine learning models with adaptive compression level. Our approach is inspired by the recently proposed three point compressor (3PC) framework of Richtarik et al. (2022), which includes error feedback (EF21), lazily aggregated gradient (LAG), and their combination as special cases, and offers the current state-of-the-art rates for these methods under weak assumptions. While the above mechanisms offer a fixed compression level, or adapt between two extremes only, our proposal is to perform a much finer adaptation. In particular, we allow the user to choose any number of arbitrarily chosen contractive compression mechanisms, such as Top-K sparsification with a user-defined selection of sparsification levels K, or quantization with a user-defined selection of quantization levels, or their combination. AdaCGD chooses the appropriate compressor and compression level adaptively during the optimization process. Besides i) proposing a theoretically-grounded multi-adaptive communication compression mechanism, we further ii) extend the 3PC framework to bidirectional compression, i.e., we allow the server to compress as well, and iii) provide sharp convergence bounds in the strongly convex, convex and nonconvex settings. The convex regime results are new even for several key special cases of our general mechanism, including 3PC and EF21. In all regimes, our rates are superior compared to all existing adaptive compression methods.
翻译:我们提出适应压缩源(AdaCGD), 这是一种新颖的优化算法, 用于对受监管的机器学习模式进行适应性压缩水平的通信高效培训。 我们的方法受最近提议的Richtarik等人(2022年)三点压缩机(3PC)框架的启发, 其中包括错误反馈( EF21 ) 、 悬浮汇总梯(LAGA) 及其作为特例的组合, 并提供了这些方法在虚弱假设下的现有最先进比率。 虽然上述机制提供了固定压缩水平, 或只在两个极端之间进行调整, 我们的提议是进行更细的调整。 特别是, 我们允许用户选择任意选择的三点压缩压缩机( 3PTop-K) 框架, 用户定义的宽度为 K, 或以用户定义的振动度选择, 或组合。 AdaCGDGD在优化过程中选择适当的压缩和压缩等级。 除了提出有理论基础的多调整通信压缩机制, 我们甚至进一步将精细压缩三级的升级的服务器框架扩展为我们内部的固定的固定的固定的。