We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points without replacement leads to faster convergence compared to sampling with replacement. For the smooth and strongly convex-strongly concave setting, we consider gradient descent ascent and the proximal point method, and present a unified analysis of two popular without-replacement sampling strategies, namely Random Reshuffling (RR), which shuffles the data every epoch, and Single Shuffling or Shuffle Once (SO), which shuffles only at the beginning. We obtain tight convergence rates for RR and SO and demonstrate that these strategies lead to faster convergence than uniform sampling. Moving beyond convexity, we obtain similar results for smooth nonconvex-nonconcave objectives satisfying a two-sided Polyak-{\L}ojasiewicz inequality. Finally, we demonstrate that our techniques are general enough to analyze the effect of data-ordering attacks, where an adversary manipulates the order in which data points are supplied to the optimizer. Our analysis also recovers tight rates for the incremental gradient method, where the data points are not shuffled at all.
翻译:我们分析平滑的有限和微缩轴优化的随机梯度算法的趋同率,并表明,对许多此类算法而言,对数据点进行抽样而不替换,与抽样和替换相比,会更快地趋同。对于平稳的和强烈的混凝土混凝土环境,我们考虑梯度下降率上升和接近于偏移点方法,并对两种流行的、不替换的不替换采样策略,即随机重压(RRR)和单打碎或打碎(SO)进行统一分析,即随机重压(RRR)和SO(SO),它们只是在开始时才打乱的数据。我们获得了RR和SO的紧凑趋同率,并表明这些战略比统一取样速度更快。我们超越了粘结度,我们对平滑的非convex-nonconcave目标取得了类似的结果,满足了两面的聚变聚-L}ojasiewicz不平等性。最后,我们证明我们的技术非常笼统,足以分析数据排序攻击的效果,即对手操纵数据点不是提供给优化点的顺序的顺序。我们的分析还恢复了在优化中的数据的紧率。