Minimax optimization is gaining increasing attention in modern machine learning applications. Driven by large-scale models and massive volumes of data collected from edge devices, as well as the concern to preserve client privacy, communication-efficient distributed minimax optimization algorithms become popular, such as Local Stochastic Gradient Descent Ascent (Local-SGDA), and Local Decentralized SGDA (Local-DSGDA). While most existing research on distributed minimax algorithms focuses on convergence rates, computation complexity, and communication efficiency, the generalization performance remains underdeveloped, whereas generalization ability is a pivotal indicator for evaluating the holistic performance of a model when fed with unknown data. In this paper, we propose the stability-based generalization analytical framework for Distributed-SGDA, which unifies two popular distributed minimax algorithms including Local-SGDA and Local-DSGDA, and conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics under various settings, e.g., (S)C-(S)C, PL-SC, and NC-NC cases. Our theoretical results reveal the trade-off between the generalization gap and optimization error and suggest hyperparameters choice to obtain the optimal population risk. Numerical experiments for Local-SGDA and Local-DSGDA validate the theoretical results.
翻译:暂无翻译