In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We first analyze an important special case, empirical minimax problem, where the overall objective approximates a true population minimax risk by statistical samples. We provide generalization bounds for learning with this objective through Rademacher complexity analysis. Then, we focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT). When local objectives are Lipschitz smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges linearly with a constant stepsize to global $\epsilon$-approximation solution with $\mathcal{O}(\log (1/\epsilon))$ rounds of communication, which matches the time complexity of centralized GDA method. Finally, we numerically show that FedGDA-GT outperforms Local SGDA.
翻译:在本文中,我们研究一个大型多试剂小麦最大优化问题,这是在统计学习和游戏理论(包括General Adversarial Networks(GANs))中许多有趣的应用模型。总体目标是代理商私人本地目标功能的总和。我们首先分析一个重要的特例,即实证小麦最大问题,总体目标通过统计样本接近真正的人口小麦风险。我们通过Rademacher复杂分析,为学习这个目标提供了概括性界限。然后,我们侧重于联合的设置,即代理商可以进行本地计算并与中央服务器进行沟通。大多数现有的联合的迷你算法要么需要每迭代进行通信,要么缺乏绩效保障,但当地托盘梯级增长源源(SGDA)除外,多位本地更新的下行算法可以保证在逐渐缩小的步伐下趋同地方GGDA,我们一般无法保证与固定的步伐一致,因此,当我们建议FDGA-GA-GM(O-G-GDG(O-G-IG)最终显示一种基于GDGUD(G-G-GD-G-GD-GD-G-GD-G-G-GD-GD-G-G-G-G-GD-GLD-GD-GD-GD-GD-GD-GD-GD-G-G-G-G-G-GD-G-G-G-G-G-G-G-G-G-G-GD-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G)-G)-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-