We provide new gradient-based methods for efficiently solving a broad class of ill-conditioned optimization problems. We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components. This complexity bound (which we prove is nearly optimal) can improve almost exponentially on that of accelerated gradient methods, which grow as the square root of the condition number of $f$. Additionally, we provide efficient methods for solving stochastic, quadratic variants of this multiscale optimization problem. Rather than learn the decomposition of $f$ (which would be prohibitively expensive), our methods apply a clean recursive "Big-Step-Little-Step" interleaving of standard methods. The resulting algorithms use $\tilde{\mathcal{O}}(d m)$ space, are numerically stable, and open the door to a more fine-grained understanding of the complexity of convex optimization beyond condition number.
翻译:我们提供了新的梯度方法,以有效解决一系列条件差的优化问题。 我们考虑了最大限度地减少一个函数 $f 的问题:\ mathbb{R ⁇ d\rightrow \mathb{R} 美元,该函数可以隐含地分解,因为其总和是未知的美元,非互动的平滑, 强烈的曲线函数, 并且提供了一种方法,通过一些梯度评估来解决这一问题, 这些梯度评估(在对数系数上)是各组件条件数的平方根。 这种复杂性(我们证明是几乎最佳的)可以随着加速梯度方法而几乎加速地改进。 加速梯度方法作为美元条件数的平方根增长。 此外,我们提供了解决这一多尺度优化问题的平坦性、四方形变量的有效方法。 我们的方法不是了解美元的分解位置(会过于昂贵),而是使用一个清晰的“ ig-Step-ltreep-step” 中间方法。 由此而形成的“Big-Stal-stle-step” 中间方法(我们证明是最佳的) 可以快速地改进。 方法的算法, 更稳定的通缩化的通缩的通缩缩化方法使用。