The stochastic subgradient method is a widely-used algorithm for solving large-scale optimization problems arising in machine learning. Often these problems are neither smooth nor convex. Recently, Davis et al. [1-2] characterized the convergence of the stochastic subgradient method for the weakly convex case, which encompasses many important applications (e.g., robust phase retrieval, blind deconvolution, biconvex compressive sensing, and dictionary learning). In practice, distributed implementations of the projected stochastic subgradient method (stoDPSM) are used to speed-up risk minimization. In this paper, we propose a distributed implementation of the stochastic subgradient method with a theoretical guarantee. Specifically, we show the global convergence of stoDPSM using the Moreau envelope stationarity measure. Furthermore, under a so-called sharpness condition, we show that deterministic DPSM (with a proper initialization) converges linearly to the sharp minima, using geometrically diminishing step-size. We provide numerical experiments to support our theoretical analysis.
翻译:随机亚梯度法是一种广泛使用的算法,用于解决机器学习中出现的大规模优化问题,这些问题往往既不顺利,也不复杂。 最近,Davis等人[1-2] 指出,微软 convex案的随机亚梯度法的趋同,这包括许多重要的应用(例如,稳健的阶段检索、盲分解、双convex压缩感和字典学习)。在实践中,预测的随机亚梯度法(stoDPSM)的分布应用被用于加速风险最小化。在本文件中,我们建议以理论保证的方式对随机次梯度法进行分布式实施。具体地说,我们用莫罗信封的定点性测量方法显示StoDPSM的全球趋同。此外,在所谓的锐性条件下,我们显示,确定性DPSM(有适当的初始化)可直线地与锐微微型相连接,使用几何分级缩缩放。我们提供数字实验来支持我们的理论分析。