随机逼近与随机梯度下降算法的再探讨 (Revisiting Stochastic Approximation and Stochastic Gradient Descent)

In this paper, we introduce a new approach to proving the convergence of the Stochastic Approximation (SA) and the Stochastic Gradient Descent (SGD) algorithms. The new approach is based on a concept called GSLLN (Generalized Strong Law of Large Numbers), which extends the traditional SLLN. Using this concept, we provide sufficient conditions for convergence, which effectively decouple the properties of the function whose zero we are trying to find, from the properties of the measurement errors (noise sequence). The new approach provides an alternative to the two widely used approaches, namely the ODE approach and the martingale approach, and also permits a wider class of noise signals than either of the two known approaches. In particular, the ``noise'' or measurement error \textit{need not} have a finite second moment, and under suitable conditions, not even a finite mean. By adapting this method of proof, we also derive sufficient conditions for the convergence of zero-order SGD, wherein the stochastic gradient is computed using $2d$ function evaluations, but no gradient computations. The sufficient conditions derived here are the weakest to date, thus leading to a considerable expansion of the applicability of SA and SGD theory.

翻译：本文提出了一种证明随机逼近（SA）与随机梯度下降（SGD）算法收敛性的新方法。该方法基于一种称为广义强大数定律（GSLLN）的概念，该概念扩展了传统的大数定律。利用这一概念，我们给出了收敛的充分条件，这些条件有效地将目标函数（其零点为求解目标）的性质与测量误差（噪声序列）的性质解耦。这一新方法为两种广泛使用的经典方法（即常微分方程方法及鞅方法）提供了替代方案，且允许比已知两种方法更广泛的噪声信号类别。特别地，测量误差或“噪声”无需具有有限的二阶矩，在适当条件下甚至无需具有有限的一阶矩。通过调整此证明方法，我们还推导了零阶SGD收敛的充分条件，其中随机梯度通过$2d$次函数评估计算，而无需进行梯度计算。本文推导的充分条件是迄今为止最弱的条件，从而显著拓展了SA与SGD理论的应用范围。