We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates. The algorithm is optimal with respect to its dependence on both the minibatch size and minimum expected loss simultaneously. This improves over the optimal method of Lan (2012), which is insensitive to the minimum expected loss; over the optimistic acceleration of Cotter et al. (2011), which has suboptimal dependence on the minibatch size; and over the algorithm of Liu and Belkin (2018), which is limited to least squares problems and is also similarly suboptimal with respect to the minibatch size. Applied to interpolation learning, the improvement over Cotter et al. and Liu and Belkin translates to a linear, rather than square-root, parallelization speedup.
翻译:我们提出并分析一种算法,以利用迷你批次的梯度估计数优化滑动和顺流或强烈弯曲目标。算法在同时依赖微型批次大小和最低预期损失方面是最佳的。这改善了兰的最佳方法(2012年),该方法对最低预期损失不敏感;科特尔等人(2011年)乐观地加速了科特尔等人(该方法对小型批次大小的依赖度低于最佳水平);刘和贝尔金(2018年)的算法(该算法限于最小方形问题,在微型批次大小方面也同样低于最佳。适用于内推学,Cotter等人以及刘和贝尔金(Liu and Belkin)的改进转化为线性而非平方根的平行加速。