We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the gradient norm of smooth, possibly nonconvex functions. We provide several results, implying that the $\mathcal{O}(\epsilon^{-4})$ upper bound of Ghadimi and Lan~\cite{ghadimi2013stochastic} (for making the average gradient norm less than $\epsilon$) cannot be improved upon, unless a combination of additional assumptions is made. Notably, this holds even if we limit ourselves to convex quadratic functions. We also show that for nonconvex functions, the feasibility of minimizing gradients with SGD is surprisingly sensitive to the choice of optimality criteria.
翻译:我们研究随机梯度下降(SGD)的迭代复杂性,以最大限度地减少光滑、可能非电解函数的梯度规范。我们提供了一些结果,暗示Ghadimi和Lançite{ghadimi2013stochatic}的美元上限(使平均梯度规范低于$/epsilon$)无法得到改善,除非同时作出其他假设。值得注意的是,即使我们仅限于 convex二次函数,这仍然有效。我们还表明,对于非电解函数,用 SGD 最小化梯度的可行性对于最佳性标准的选择非常敏感。