We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing $(\delta,\epsilon)$-stationary points. Several recent works have presented randomized algorithms that produce such points using $\tilde O(\delta^{-1}\epsilon^{-3})$ first-order oracle calls, independent of the dimension $d$. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of $\Omega(d)$ for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of $\tilde O(\delta^{-1}\epsilon^{-3})$ can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner, resolving an open question. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical white-box setting in which the optimizer is granted access to the network's architecture, we propose a simple, dimension-free, deterministic smoothing that provably preserves $(\delta,\epsilon)$-stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm, this yields the first deterministic dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound.
翻译:我们通过生产 $( delta,\ epsilon) 固定的 美元, 来研究优化非moot Unconvex Lipschitz 功能的复杂性。 最近的一些工程展示了随机化算法, 使用 $tilde O( delta ⁇ -1 ⁇ ⁇ epsilón {-3}) 来生成这些点。 与维度无关, 我们研究的是, 是否可以通过确定性算法获得类似的结果。 我们解决了这个开放的问题, 表明随机化对于获得一个无维度的速度来说是必要的。 特别是, 我们证明, 任何确定性算法的任意性算法, 与平滑或convex 优化不同的是, 任何确定性算法的算法都需要使用功能。 另一面, 我们证明, 如果这个函数甚至略过平滑, 那么, 美元(\ delta) -1- eplorislational) 的算法, 也可以通过一个简单的算法性算法 来获取 。