We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach which approximates a stochastic gradient on a set of $l\leq d$ orthogonal directions, where $d$ is the dimension of the ambient space. These directions are randomly chosen, and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form $O(d/l k^{-c})$ for every $c<1/2$, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations. Our bound also shows the benefits of using $l$ multiple directions instead of one. For non-convex functions satisfying the Polyak-{\L}ojasiewicz condition, we establish the first convergence rates for stochastic zeroth order algorithms under such an assumption. We corroborate our theoretical findings in numerical simulations where assumptions are satisfied and on the real-world problem of hyper-parameter optimization, observing that S-SZD has very good practical performances.
翻译:我们引入并分析结构的零位顺序下游(S-SZD),这是一种有限的差异方法,它近似于美元=leq d d d d d 美元正方形方向的随机梯度,而美元是环境空间的维度。这些方向是随机选择的,并可能在每一步上发生变化。对于平滑的 convex 函数,我们几乎可以确定迭代的趋同率和对美元(d/l k ⁇ -c})形式零顺序算法的函数值的趋同率,每1美元 < 1/ 美元,就迭代数量而言,它任意接近于Stochatical Egradient 源(SGD)的值。我们的界限还显示了使用美元多方向而不是一的优点。对于满足 Polyak-L}ojasiewicz 条件的非convex 函数,我们将在这种假设下为随机零顺序算算算算算算算算算法的首个趋同率。我们证实了我们在数字模拟中得出的理论结论,在符合假设的地方和关于实际的超分辨率调整问题,观测S-S-S-Z具有良好性性能。