Recently, the (gradient-based) bilevel programming framework is widely used in hyperparameter optimization and has achieved excellent performance empirically. Previous theoretical work mainly focuses on its optimization properties, while leaving the analysis on generalization largely open. This paper attempts to address the issue by presenting an expectation bound w.r.t. the validation set based on uniform stability. Our results can explain some mysterious behaviours of the bilevel programming in practice, for instance, overfitting to the validation set. We also present an expectation bound for the classical cross-validation algorithm. Our results suggest that gradient-based algorithms can be better than cross-validation under certain conditions in a theoretical perspective. Furthermore, we prove that regularization terms in both the outer and inner levels can relieve the overfitting problem in gradient-based algorithms. In experiments on feature learning and data reweighting for noisy labels, we corroborate our theoretical findings.
翻译:最近,(基于梯度的)双级编程框架被广泛用于超参数优化,并取得了出色的实绩。以前的理论工作主要侧重于其优化特性,而对于一般化的分析则基本上不开放。本文件试图通过提出一个基于统一稳定性的验证集来解决这个问题。我们的结果可以解释双级编程在实践中的一些神秘行为,例如,过分符合验证集。我们还对传统的交叉校准算法提出了一种预期。我们的结果表明,在某些条件下,基于梯度的算法可以比理论角度的交叉校验好。此外,我们证明,外部和内部两级的正规化术语可以缓解基于梯度的算法中过于适应的问题。在关于特征学习和数据重配对噪音标签的实验中,我们证实了我们的理论结论。