Variance reduction techniques such as SPIDER/SARAH/STORM have been extensively studied to improve the convergence rates of stochastic non-convex optimization, which usually maintain and update a sequence of estimators for a single function across iterations. {\it What if we need to track multiple functional mappings across iterations but only with access to stochastic samples of $\mathcal{O}(1)$ functional mappings at each iteration?} There is an important application in solving an emerging family of coupled compositional optimization problems in the form of $\sum_{i=1}^m f_i(g_i(\mathbf{w}))$, where $g_i$ is accessible through a stochastic oracle. The key issue is to track and estimate a sequence of $\mathbf g(\mathbf{w})=(g_1(\mathbf{w}), \ldots, g_m(\mathbf{w}))$ across iterations, where $\mathbf g(\mathbf{w})$ has $m$ blocks and it is only allowed to probe $\mathcal{O}(1)$ blocks to attain their stochastic values and Jacobians. To improve the complexity for solving these problems, we propose a novel stochastic method named Multi-block-Single-probe Variance Reduced (MSVR) estimator to track the sequence of $\mathbf g(\mathbf{w})$. It is inspired by STORM but introduces a customized error correction term to alleviate the noise not only in stochastic samples for the selected blocks but also in those blocks that are not sampled. With the help of the MSVR estimator, we develop several algorithms for solving the aforementioned compositional problems with improved complexities across a spectrum of settings with non-convex/convex/strongly convex objectives. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on the strong convexity parameter. Empirical studies on multi-task deep AUC maximization demonstrate the better performance of using the new estimator.
翻译:正在广泛研究DIDR/ SARAH/STORM 等减少差异的技术, 以改善以 $\ scumci=1 i_ i_i( g_i (mathb{w}) 的形式出现的组合组合组合组合的组合式优化优化率, 该组合通常会维持并更新用于跨迭代的单一函数的估算器序列。 ~ 如果我们需要跟踪跨迭代的多重功能映射, 但只有在每次迭代都访问 $\ mathcal{O} 功能映射器样本时, 才能跟踪和估计 $\ mathcal 的序列 。 仅以 mathbfw{w} (g_\\\\ mache) 的调色化器, 以 $xxxx 格式( m) 的精化器的精度优化。 以 =\\\\ max max 的直径解度显示, 以 美元 的直流解器的直径解器的精度 。