Studying the properties of stochastic noise to optimize complex non-convex functions has been an active area of research in the field of machine learning. Prior work has shown that the noise of stochastic gradient descent improves optimization by overcoming undesirable obstacles in the landscape. Moreover, injecting artificial Gaussian noise has become a popular idea to quickly escape saddle points. Indeed, in the absence of reliable gradient information, the noise is used to explore the landscape, but it is unclear what type of noise is optimal in terms of exploration ability. In order to narrow this gap in our knowledge, we study a general type of continuous-time non-Markovian process, based on fractional Brownian motion, that allows for the increments of the process to be correlated. This generalizes processes based on Brownian motion, such as the Ornstein-Uhlenbeck process. We demonstrate how to discretize such processes which gives rise to the new algorithm fPGD. This method is a generalization of the known algorithms PGD and Anti-PGD. We study the properties of fPGD both theoretically and empirically, demonstrating that it possesses exploration abilities that, in some cases, are favorable over PGD and Anti-PGD. These results open the field to novel ways to exploit noise for training machine learning models.
翻译:研究随机噪音的特性以优化复杂的非电流功能是机器学习领域一个积极的研究领域。先前的工作表明,通过克服景观中不受欢迎的障碍,随机梯度下降的噪音可以改善优化。此外,注射人工高斯噪音已成为一种流行的想法,可以迅速摆脱马鞍点。事实上,在缺乏可靠的梯度信息的情况下,噪音被用来探索景观,但从勘探能力方面看,什么类型的噪音是最佳的。为了缩小我们的知识差距,我们研究了一种基于分数布朗运动的连续时间非马可维进程的一般类型,这样可以使过程的递增相互挂钩。这种一般化过程是以布朗运动为基础的,例如Ornstein-Uhlenbeck进程。我们展示了如何将这种过程分解以产生新的算法FPGD。这种方法是已知的算法PGD和反PGD的概括化。我们研究了FPGD的理论和实验性特征,从理论和实验上证明它拥有探索能力,从而获得新的PGD模型。