A recent line of ground-breaking results for permutation-based SGD has corroborated a widely observed phenomenon: random permutations offer faster convergence than with-replacement sampling. However, is random optimal? We show that this depends heavily on what functions we are optimizing, and the convergence gap between optimal and random permutations can vary from exponential to nonexistent. We first show that for 1-dimensional strongly convex functions, with smooth second derivatives, there exist permutations that offer exponentially faster convergence compared to random. However, for general strongly convex functions, random permutations are optimal. Finally, we show that for quadratic, strongly-convex functions, there are easy-to-construct permutations that lead to accelerated convergence compared to random. Our results suggest that a general convergence characterization of optimal permutations cannot capture the nuances of individual function classes, and can mistakenly indicate that one cannot do much better than random.
翻译:最近一系列基于变异的 SGD 的突破性结果证实了一个广泛观察到的现象:随机变异比替换抽样的趋同速度要快。 然而,随机变异是随机最佳的。 我们显示,这在很大程度上取决于我们优化的功能,最佳变异和随机变异之间的趋同差距从指数性到零性不等。 我们首先显示,对于一维的强烈变异函数,即光滑的第二衍生物,存在着变异性,与随机性相比指数化的趋同速度更快。 但是,对于一般的强烈变异函数来说,随机变异是最佳的。 最后,我们显示,对于二次变异性、强变异性函数来说,容易构建的变异性导致加速趋同速度,而不是随机性。 我们的结果表明,对最佳变异性的一般趋同性特征无法捕捉到单个函数类别的细微之处,而且可以错误地指出,一个人不能比随机性更好。