Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory. For empirical risk minimization (ERM) problems, we establish the optimal excess population risk bounds for both smooth and non-smooth cases by introducing on-average argument stability. For minimax problems, we develop a quantitative connection between on-average argument stability and generalization error which extends the existing results for uniform stability \cite{lei2021stability}. We further develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability, which, combined with our stability results, show that the optimal generalization bounds can be attained for both smooth and non-smooth cases. To the best of our knowledge, this is the first generalization analysis of SGMs when the gradients are sampled from a Markov process.
翻译:最近,在研究Markov链梯度方法(MC-SGMs)方面做了大量工作,主要侧重于对统一分析,以解决最小化问题;在本文件中,我们从统计学习理论框架内的算法稳定性角度,对混合模型问题进行综合分析,以尽量减少和微小问题;关于尽量减少风险的经验性问题,我们通过引入平均参数稳定性,为平稳和非湿性案例建立最佳的超人口风险界限;关于小型最大问题,我们在平均参数稳定性和一般化错误之间建立了定量联系,以扩展统一稳定性的现有结果;我们进一步开发了第一个几乎最佳的合并率,既在预期中,又极有可能,这与我们的稳定结果相结合,表明在平稳和非湿性案例中都能达到最佳的通用界限;就我们所知,这是从马克夫进程取样的梯度中提取梯度时对 SGMs的第一次全面分析。