We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growth on the concentration bound between the gradients of sample and population loss functions. We demonstrate that the Polyak step size gradient descent iterates reach a final statistical radius of convergence around the true parameter after logarithmic number of iterations in terms of the sample size. It is computationally cheaper than the polynomial number of iterations on the sample size of the fixed-step size gradient descent algorithm to reach the same final statistical radius when the population loss function is not locally strongly convex. Finally, we illustrate our general theory under three statistical examples: generalized linear model, mixture model, and mixed linear regression model.
翻译:我们研究了在人口损失功能普遍平滑和Lojasiewicz条件下Polyak 梯度梯度梯度下降算法的统计和计算复杂性,即当抽样规模达到无限度时经验损失函数的限度,以及经验性和人口损失函数梯度之间的稳定性,即关于抽样梯度与人口损失函数之间集中的多角度增长。我们证明,Polyak 梯度梯度梯度下降值在抽样规模的迭代数对数之后,在真实参数周围达到最后的统计半径。在人口损失函数不具有很强的本地共性时,计算比固定梯度梯度梯度梯度下降算法抽样规模的多数值要便宜,以达到相同的最终统计半径。最后,我们用三个统计例子(通用线性模型、混合模型和混合线性回归模型)来说明我们的一般理论。