This paper studies delayed stochastic algorithms for weakly convex optimization in a distributed network with workers connected to a master node. More specifically, we consider a structured stochastic weakly convex objective function which is the composition of a convex function and a smooth nonconvex function. Recently, Xu et al. 2022 showed that an inertial stochastic subgradient method converges at a rate of $\mathcal{O}(\tau/\sqrt{K})$, which suffers a significant penalty from the maximum information delay $\tau$. To alleviate this issue, we propose a new delayed stochastic prox-linear ($\texttt{DSPL}$) method in which the master performs the proximal update of the parameters and the workers only need to linearly approximate the inner smooth function. Somewhat surprisingly, we show that the delays only affect the high order term in the complexity rate and hence, are negligible after a certain number of $\texttt{DSPL}$ iterations. Moreover, to further improve the empirical performance, we propose a delayed extrapolated prox-linear ($\texttt{DSEPL}$) method which employs Polyak-type momentum to speed up the algorithm convergence. Building on the tools for analyzing $\texttt{DSPL}$, we also develop improved analysis of delayed stochastic subgradient method ($\texttt{DSGD}$). In particular, for general weakly convex problems, we show that convergence of $\texttt{DSGD}$ only depends on the expected delay.
翻译:本文研究延迟了在分布式网络中与连接主节点的工人共享的微软 convex优化的随机算法 。 更具体地说, 我们考虑一种结构化的随机微弱 convex 目标函数, 即 convex 函数的构成和一个平滑的非 convex 函数。 最近, Xu 等人 2022 显示, 惯性随机亚梯度方法会以$\ mathcal{O} (Tau/\\ sqrt{K} 美元的速度趋近, 并且因信息的最大延迟而承受了相当大的处罚 $\ t$\ t$。 为了缓解这一问题, 我们建议一种新的延迟的 prochacis- plineallinearies ($\ divd) 方法, 并且为了进一步改进 Proclentral developal developmental develople ral develoption ral developtions, 我们提议了一种不透明性的方法。