Stochastic majorization-minimization (SMM) is a class of stochastic optimization algorithms that proceed by sampling new data points and minimizing a recursive average of surrogate functions of an objective function. The surrogates are required to be strongly convex and convergence rate analysis for the general non-convex setting was not available. In this paper, we propose an extension of SMM where surrogates are allowed to be only weakly convex or block multi-convex, and the averaged surrogates are approximately minimized with proximal regularization or block-minimized within diminishing radii, respectively. For the general nonconvex constrained setting with non-i.i.d. data samples, we show that the first-order optimality gap of the proposed algorithm decays at the rate $O((\log n)^{1+\epsilon}/n^{1/2})$ for the empirical loss and $O((\log n)^{1+\epsilon}/n^{1/4})$ for the expected loss, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+\epsilon}/n^{1/2})$. As a corollary, we obtain the first convergence rate bounds for various optimization methods under general nonconvex dependent data setting: Double-averaging projected gradient descent and its generalizations, proximal point empirical risk minimization, and online matrix/tensor decomposition algorithms. We also provide experimental validation of our results.
翻译:随机主双模极小化(SMM)是一类随机优化算法,其通过采样新的数据点并最小化目标函数的伪损失函数的递归平均值来进行。伪损失函数需要是强凸的,而一般非凸情况下的收敛率分析尚不可行。本文提出一种SMM的扩展:伪损失函数仅需为弱凸或块双凸,并通过近端正则化或在逐渐缩小的半径范围内进行块最小化来近似最小化平均伪损失函数。对于具有非独立同分布数据样本的一般非凸约束情况,我们证明了所提算法的一阶最优性差可以在经验损失和期望损失分别达到$O((\log n)^{1+\epsilon}/n^{1/2})$和$O((\log n)^{1+\epsilon}/n^{1/4})$的收敛速率,其中$n$表示处理的数据样本数量。在某些额外假设下,后一种收敛速率可以提高到$O((\log n)^{1+\epsilon}/n^{1/2})$。作为推论,我们得到了各种在一般非凸相关数据设定下的优化方法的首个收敛率界:双平均投影梯度下降及其扩展方法、近端点经验风险最小化,以及在线矩阵/张量分解算法。我们还提供了实验验证。