Using gradient descent (GD) with fixed or decaying step-size is a standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the step-size of the GD algorithm. Under homogeneous assumptions on the loss function, we demonstrate that the iterates of the proposed \emph{exponential step size gradient descent} (EGD) algorithm converge linearly to the optimal solution. Leveraging that optimization insight, we then consider using the EGD algorithm for solving parameter estimation under both regular and non-regular statistical models whose loss function becomes locally convex when the sample size goes to infinity. We demonstrate that the EGD iterates reach the final statistical radius within the true parameter after a logarithmic number of iterations, which is in stark contrast to a \emph{polynomial} number of iterations of the GD algorithm in non-regular statistical models. Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models while being comparable to that of the GD in regular statistical settings. To the best of our knowledge, it resolves a long-standing gap between statistical and algorithmic computational complexities of parameter estimation in non-regular statistical models. Finally, we provide targeted applications of the general theory to several classes of statistical models, including generalized linear models with polynomial link functions and location Gaussian mixture models.
翻译:使用固定或衰变的梯度下降( GD) 是未受限制的优化问题的标准做法 。 但是, 当损失函数只是本地的精度时, 这样的梯度缩进表会人为地减缓 GD, 因为它无法探索损失函数的平坦曲线。 为了克服这一问题, 我们提议指数化地增加 GD 算法的梯度。 在对损失函数的一致假设下, 我们证明, 提议的 \ emph{ 超度梯度梯度下降} (EGD) 算法的反复性会直线地聚集到最佳的解决方案中。 利用这种优化洞察觉, 我们然后考虑使用 EGD 定期的逻辑算法来在常规和非常规的统计模型下解算参数估计参数, 当样本大小变得不精确时, 显示 EGG 的梯度在真实参数中达到最终的统计半值值值。 与 empalalal- colomal adal- explia develrial deal deal deal dal disal disal disal 和在非定期统计模型中的统计模型中, comliversal dal dal dal dal dal commagistration sal disl disl 。 因此, 在统计模型中, 和Slational disl dal disl disl disl disl dal dal dal disald dismldal commldaldaldaldald disaldaldaldaldaldaldald disldaldald disaldald disld 。 因此, 在统计模型中, 在统计模型中, 在统计模型中, 在统计模型中, 在统计模型中, 在统计模型中, 在统计模型中, 在统计模型中, 在统计模型中, 在统计模型中, 在统计模型中, 和GIaldaldalislislislisl 和GIaldal- sal- slational- slislislisal- slislislislislisllislislislislislislisldal- sl