多区块单一概率差异减少的混合组成优化估计数字 (Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization)

Variance reduction techniques such as SPIDER/SARAH/STORM have been extensively studied to improve the convergence rates of stochastic non-convex optimization, which usually maintain and update a sequence of estimators for a single function across iterations. What if we need to track multiple functional mappings across iterations but only with access to stochastic samples of $\mathcal{O}(1)$ functional mappings at each iteration? There is an important application in solving an emerging family of coupled compositional optimization problems in the form of $\sum_{i=1}^m f_i(g_i(\mathbf{w}))$, where $g_i$ is accessible through a stochastic oracle. The key issue is to track and estimate a sequence of $\mathbf g(\mathbf{w})=(g_1(\mathbf{w}), \ldots, g_m(\mathbf{w}))$ across iterations, where $\mathbf g(\mathbf{w})$ has $m$ blocks and it is only allowed to probe $\mathcal{O}(1)$ blocks to attain their stochastic values and Jacobians. To improve the complexity for solving these problems, we propose a novel stochastic method named Multi-block-Single-probe Variance Reduced (MSVR) estimator to track the sequence of $\mathbf g(\mathbf{w})$. It is inspired by STORM but introduces a customized error correction term to alleviate the noise not only in stochastic samples for the selected blocks but also in those blocks that are not sampled. With the help of the MSVR estimator, we develop several algorithms for solving the aforementioned compositional problems with improved complexities across a spectrum of settings with non-convex/convex/strongly convex/Polyak-{\L}ojasiewicz (PL) objectives. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on the strong convexity parameter. Empirical studies on multi-task deep AUC maximization demonstrate the better performance of using the new estimator.

翻译：正在广泛研究DIDR/ SARAH/ StorM 等减少差异的技术。已经广泛研究了一个重要应用程序, 以解决一个正在形成的组合, 其组成优化问题以 $\ sumcici=1 f_ i (g_ i (mathbf{w}) 的形式出现。该组合通常维持并更新整个迭代中单一函数的估算器序列。如果我们需要跟踪跨迭代的多重功能映射, 但只能访问 $mathcal{O} 的随机抽样样本。关键议题是跟踪和估算一个 $\ g( mathbf{w} ) 的随机序列。仅使用 $\ comlifi (mathb) comlicultical(mab) 变色变色/ 变色(mab) 变色(macr) 变色(max) 变色(max) 变色(max) 变色( max) max) 变色( max) 变色(max) max) 变色(smax) max) 变色(smax) 变色(smax) max) max) 变色(s) max) 变色(smax max max max max max max) max max max max max max max max max max max max max max max max max max max max max