关于双级最佳优化问题的稳定和普遍化问题</s> (On Stability and Generalization of Bilevel Optimization Problem)

from arxiv, This paper currently contains unresolved technical flaws that have the potential to mislead readers. However, we are committed to addressing these issues and improving the quality of the paper in the future

(Stochastic) bilevel optimization is a frequently encountered problem in machine learning with a wide range of applications such as meta-learning, hyper-parameter optimization, and reinforcement learning. Most of the existing studies on this problem only focused on analyzing the convergence or improving the convergence rate, while little effort has been devoted to understanding its generalization behaviors. In this paper, we conduct a thorough analysis on the generalization of first-order (gradient-based) methods for the bilevel optimization problem. We first establish a fundamental connection between algorithmic stability and generalization error in different forms and give a high probability generalization bound which improves the previous best one from $\bigO(\sqrt{n})$ to $\bigO(\log n)$, where $n$ is the sample size. We then provide the first stability bounds for the general case where both inner and outer level parameters are subject to continuous update, while existing work allows only the outer level parameter to be updated. Our analysis can be applied in various standard settings such as strongly-convex-strongly-convex (SC-SC), convex-convex (C-C), and nonconvex-nonconvex (NC-NC). Our analysis for the NC-NC setting can also be extended to a particular nonconvex-strongly-convex (NC-SC) setting that is commonly encountered in practice. Finally, we corroborate our theoretical analysis and demonstrate how iterations can affect the generalization error by experiments on meta-learning and hyper-parameter optimization.

翻译：双层优化是机器学习中经常遇到的一个问题,它涉及多种应用,如元学习、超参数优化和强化学习等。关于该问题的现有研究大多只侧重于分析趋同率或提高趋同率,而很少努力去了解其概括性行为。在本文中,我们对双层优化问题的第一阶(基于等级的)方法的概括性进行透彻分析,同时只允许更新外部参数。我们首先在不同形式的算法稳定性和一般化错误之间建立了基本联系,并给出了高概率的概括性约束,将美元(SC-scrt{n})到美元(B$BigO(ggn)n),美元(gonn),美元(g),美元(gg),美元(g),(g),(g),(g) 美元,(g) 美元,(g) 美元,(g) 美元,(g) 美元,(g) 美元,(g) 美元,(con) 和nex-nex-nex-nex-nex-colalcalcal-cal-cal-cal-deal-cal-cal-creal-deal-creal-cal-cal-creal-dealx,(我们内部-nex-nex-c-cal-cal-cal-cal-cx-cal-c-c-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-l-c-cal-cal-c-c-l-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-cal-l-l-l) la) laisal-l-l) 和不可以对常规-c-cl) 和n-cl) 和不、C-</s>