Inspired by a recent breakthrough of Mishchenko et al (2022), who for the first time showed that local gradient steps can lead to provable communication acceleration, we propose an alternative algorithm which obtains the same communication acceleration as their method (ProxSkip). Our approach is very different, however: it is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications: i) we allow for an inexact computation of the prox operator of a certain smooth strongly convex function via a suitable gradient-based method (e.g., GD, Fast GD or FSFOM), ii) we perform a careful modification of the dual update step in order to retain linear convergence. Our general results offer the new state-of-the-art rates for the class of strongly convex-concave saddle-point problems with bilinear coupling characterized by the absence of smoothness in the dual function. When applied to federated learning, we obtain a theoretically better alternative to ProxSkip: our method requires fewer local steps ($O(\kappa^{1/3})$ or $O(\kappa^{1/4})$, compared to $O(\kappa^{1/2})$ of ProxSkip), and performs a deterministic number of local steps instead. Like ProxSkip, our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.
翻译:Mishchenko等人(2022年)最近突破了Mishchenko等人(2022年),他们首次显示,本地梯度步骤可导致可证实的通信加速,我们提出一种替代算法,以获得与其方法(ProxSkip)相同的通信加速率。然而,我们的方法却大相径庭:它基于Chambolle和Pock(2011年)的著名方法,并进行了一些非三重性修改:i)我们允许通过适当的梯度改进方法(例如GD、快速GD或FSFOM)对某种非常顺畅的阵形函数运行者进行不精确的计算(例如,GD、快速GD或FSFOMM),ii)我们对双轨更新步骤进行了仔细的修改,以保持线性趋同。我们的总体结果提供了以双线连接为特点的双线连接点问题的新状态率,其特征是两端功能缺乏光滑度。当应用进化学习时,我们可以获得一种更好的理论替代(ProxSkip :我们的方法需要更少的本地级步骤(O\lipp$)1/3}或者O\\\\Qroqlapp) roqroqp a roqstrodeal rop) roapp a roax_ a roapplexx_ a lappy a laction a lax_ a (roqp) lax_______) lax_ a laxx_ a laxxxxxxx_ a a laxxxxxxxxxx_ a a lappy a laxxxxx) laxx) a (我们的本地的本地步骤,或 lax___________________xxxxxxx_ a laction a laction a laction_____ a lax_x_x_xx______ a roa_______________xxxxx_x_