Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they require (SVRG/SARAH) are manageable. A promising approach to achieving variance reduction while avoiding these drawbacks is the use of importance sampling instead of control variates. While many such methods have been proposed in the literature, directly proving that they improve the convergence of the resulting optimization algorithm has remained elusive. In this work, we propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient). We analyze the convergence of SRG in the strongly-convex case and show that, while it does not recover the linear rate of control variates methods, it provably outperforms SGD. We pay particular attention to the time and memory overhead of our proposed method, and design a specialized red-black tree allowing its efficient implementation. Finally, we present empirical results to support our findings.
翻译:尽管差异减少的有限和优化算法在理论上得到了强有力的保证,但其适用性仍然局限于以下情况:它们采用的内存管理(SAG/SAGA)或它们所需要的定期全面梯度计算(SVRG/SARAH)是可以控制的。减少差异,同时避免这些缺陷的一个有希望的方法是使用重要取样而不是控制变差。虽然文献中提出了许多这类方法,直接证明它们提高了由此产生的优化算法的趋同性,但仍然难以找到。在这项工作中,我们建议采用一种基于重要性的抽样算法(SRG)(随机重加权梯度),我们分析在强固化情况下SRG的趋同性,并表明虽然它不恢复控制变异法的线性率,但它明显地超越了SGD。我们特别重视我们拟议方法的时间和记忆管理,并设计了一种专门的红黑树,以便有效地加以执行。最后,我们提出了支持我们的调查结果。