It was previously shown by Davis and Drusvyatskiy that every Clarke critical point of a generic, semialgebraic (and more generally definable in an o-minimal structure), weakly convex function is lying on an active manifold and is either a local minimum or an active strict saddle. In the first part of this work, we show that when the weak convexity assumption fails a third type of point appears: a sharply repulsive critical point. Moreover, we show that the corresponding active manifolds satisfy the Verdier and the angle conditions which were introduced by us in our previous work. In the second part of this work, we show that, under a density-like assumption on the perturbation sequence, the stochastic subgradient descent (SGD) avoids sharply repulsive critical points with probability one. We show that such a density-like assumption could be obtained upon adding a small random perturbation (e.g. a nondegenerate Gaussian) at each iteration of the algorithm. These results, combined with our previous work on the avoidance of active strict saddles, show that the SGD on a generic definable (e.g. semialgebraic) function converges to a local minimum.
翻译:Davis和Drusvyatskiy曾指出,一个通用的、半成形的(而且更一般地在微微结构中可定义的)微软锥形函数的每个克拉克临界点都位于一个活跃的元件上,它或是一个局部的最低限度,或是一个活跃的严格马鞍。在这项工作的第一部分,我们表明,当薄弱的凝固性假设未能达到第三类点时,就会出现一个明显令人厌恶的临界点。此外,我们还表明,相应的活性元体满足了我们先前工作中引入的Verdier和角度条件。在这项工作的第二部分,我们表明,在对扰动序列进行一个类似密度的假设时,在静态的次梯位下降(SGD)避免了剧烈的令人厌恶的临界点,而概率为1。我们表明,在每次迭代算法中添加一个小的随机扰动(例如,一个不退化的高斯)时,可以得出这种密度相似的假设。这些结果,加上我们先前关于避免积极严格马鞍的工作,将显示,SGDGD在最小的局部定义上显示,SGGDGDGD可达到一个可达到一个最小的最小的当地。