Momentum methods have been shown to accelerate the convergence of the standard gradient descent algorithm in practice and theory. In particular, the minibatch-based gradient descent methods with momentum (MGDM) are widely used to solve large-scale optimization problems with massive datasets. Despite the success of the MGDM methods in practice, their theoretical properties are still underexplored. To this end, we investigate the theoretical properties of MGDM methods based on the linear regression models. We first study the numerical convergence properties of the MGDM algorithm and further provide the theoretically optimal tuning parameters specification to achieve faster convergence rate. In addition, we explore the relationship between the statistical properties of the resulting MGDM estimator and the tuning parameters. Based on these theoretical findings, we give the conditions for the resulting estimator to achieve the optimal statistical efficiency. Finally, extensive numerical experiments are conducted to verify our theoretical results.
翻译:在实践和理论中,已经展示了加速标准梯度下沉算法在实践和理论中趋同的动力学方法,特别是以小型批量为基础的梯度下沉方法(MGDM)被广泛用于解决大规模数据集的大规模优化问题。尽管MGDM方法在实践中取得了成功,但其理论属性仍未得到充分探讨。为此目的,我们根据线性回归模型调查MGDM方法的理论属性。我们首先研究MGDM算法的数值趋同特性,并进一步提供理论上最佳的调整参数规格,以实现更快的趋同率。此外,我们还探讨了由此产生的MGDM估计仪的统计属性与调控参数之间的关系。根据这些理论研究结果,我们为由此得出的估计值创造条件,以实现最佳的统计效率。最后,我们进行了广泛的数字实验,以核实我们的理论结果。