We introduce \algname{ProxSkip} -- a surprisingly simple and provably efficient method for minimizing the sum of a smooth ($f$) and an expensive nonsmooth proximable ($\psi$) function. The canonical approach to solving such problems is via the proximal gradient descent (\algname{ProxGD}) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration. In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. \algname{ProxSkip} allows for the expensive prox operator to be skipped in most iterations: while its iteration complexity is $\cO(\kappa \log \nicefrac{1}{\varepsilon})$, where $\kappa$ is the condition number of $f$, the number of prox evaluations is $\cO(\sqrt{\kappa} \log \nicefrac{1}{\varepsilon})$ only. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local \algname{GD} step independently on all devices, and evaluation of prox corresponds to (expensive) communication in the form of gradient averaging. In this context, \algname{ProxSkip} offers an effective {\em acceleration} of communication complexity. Unlike other local gradient-type methods, such as \algname{FedAvg}, \algname{SCAFFOLD}, \algname{S-Local-GD} and \algname{FedLin}, whose theoretical communication complexity is worse than, or at best matching, that of vanilla \algname{GD} in the heterogeneous data regime, we obtain a provable and large improvement without any heterogeneity-bounding assumptions.
翻译:我们引入了 kalgname{ ProxSkip} -- -- 一种令人惊讶的简单且可辨别的有效方法{ 最大限度地减少平滑(f$) 和不光滑(psi$) 函数的总和。 解决这类问题的卡通方法就是通过 palx 梯度下移(\ algname{ ProxGD}) 算法, 该算法基于对 $f$ 的梯度和 $\\ pscial} 的 prox 运算。 在这项工作中, 我们特别感兴趣的是, Prox 与 梯度评估相比成本昂贵(f$) 和不透明(drioria) 。 Slickr=rickrickr} 使得昂贵的 prox运算器在大多数循环中被跳过: 其透析复杂性是 $(\ kaptappa)\ cofercrickral creal) 也就是称, $\\\\\\ lix rox rox rodeal deal deal devition at at at at at slement at at at at at at she a.