We analyze the convergence of a nonlocal gradient descent method for minimizing a class of high-dimensional non-convex functions, where a directional Gaussian smoothing (DGS) is proposed to define the nonlocal gradient (also referred to as the DGS gradient). The method was first proposed in [42], in which multiple numerical experiments showed that replacing the traditional local gradient with the DGS gradient can help the optimizers escape local minima more easily and significantly improve their performance. However, a rigorous theory for the efficiency of the method on nonconvex landscape is lacking. In this work, we investigate the scenario where the objective function is composed of a convex function, perturbed by a oscillating noise. We provide a convergence theory under which the iterates exponentially converge to a tightened neighborhood of the solution, whose size is characterized by the noise wavelength. We also establish a correlation between the optimal values of the Gaussian smoothing radius and the noise wavelength, thus justify the advantage of using moderate or large smoothing radius with the method. Furthermore, if the noise level decays to zero when approaching global minimum, we prove that DGS-based optimization converges to the exact global minimum with linear rates, similarly to standard gradient-based method in optimizing convex functions. Several numerical experiments are provided to confirm our theory and illustrate the superiority of the approach over those based on the local gradient.
翻译:我们分析了非本地梯度下降法的趋同性,以尽量减少某类高频非电流功能,其中提出方向性高斯平滑(DGS)以定义非本地梯度(也称为DGS梯度)。该方法在[42]中首次提出,其中多个数字实验显示,用DGS梯度取代传统的本地梯度,可以帮助优化者更容易地摆脱本地迷你,并大大改善其性能。然而,缺乏一种严格的理论,说明该方法在非康维克斯地貌上的效率。在这项工作中,我们调查了一种设想,即目标函数由一个螺旋函数组成,并受到振动噪音的干扰。我们提供了一种趋同理论,根据这种理论,该方法将它指数指数指数指数指数指数指数指数指数指数集中到以噪音波长为特征的更紧凑紧的解决方案区段。我们还建立了高斯光度半径和噪音波长的最佳值之间的联系,从而证明使用中度或大平滑半度方法的好处。此外,如果在接近全球最低度时,噪音水平值函数会减为零,则以全球最低水平水平水平,我们证明,以精确度标准标准标准水平的DGS标准模型将证实。