We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an $\epsilon$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrt{\epsilon})$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O(1/\sqrt{\epsilon})$ computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.
翻译:我们考虑的是将两个 convex 函数的总和最小化的问题。 其中一项函数为 Lipschitz- continy 梯度, 可以通过 Stochatic orcles 获取, 而另一项函数为“ 简单 ” 。 我们提供一种 Bregman 型算法, 其函数值加速趋同于包含最小值的球。 这个球的半径取决于问题常数, 包括随机值的差异 。 我们进一步显示, 这种算法设置自然导致弗兰克- Wolfe 变量在平行化下实现加速 。 更确切地说, 当将一个光滑的 convex 函数最小化时, 我们显示, 一个人可以在 $\ tilde{ O} (1/ \ \ \ qrt lipsilon} ) 中达到 $$( sqrt) 初等值差距( 期望) 。 我们通过只访问原始函数的梯度梯度和与 $O ( 1/\\ qrt lits exlon) 平行计算单位的线最大化或cleaclear 实现加速加速化 。 我们举例说明了合成数字实验的快速趋同 。