Gradient descent is a popular algorithm in optimization, and its performance in convex settings is mostly well understood. In non-convex settings, it has been shown that gradient descent is able to escape saddle points asymptotically and converge to local minimizers [Lee et. al. 2016]. Recent studies also show a perturbed version of gradient descent is enough to escape saddle points efficiently [Jin et. al. 2015, Ge et. al. 2017]. In this paper we show a negative result: gradient descent may take exponential time to escape saddle points, with non-pathological two dimensional functions. While our focus is theoretical, we also conduct experiments verifying our theoretical result. Through our analysis we demonstrate that stochasticity is essential to escape saddle points efficiently.
翻译:渐渐下降是一种最优化的流行算法,其功能在康韦克斯环境的表现大多为人所熟知。在非康韦克斯环境中,梯度下降已经表明,梯度下降能够不时地逃离马鞍点,并会与当地最小化者汇合[Lee等人,2016年]。最近的研究还显示,梯度下降的周期性下降足以有效逃离马鞍点[Jin等人,2015年,Ge等人,2017年]。在本文中,我们显示了一个负面的结果:梯度下降可能需要指数化的时间才能摆脱马鞍点,而非病理性的二维功能。虽然我们的重点只是理论性的,但我们也进行实验,以核实我们的理论结果。我们通过分析表明,随机性对于高效地摆脱马鞍点至关重要。