Gradient-based optimization methods for hyperparameter tuning guarantee theoretical convergence to stationary solutions when for fixed upper-level variable values, the lower level of the bilevel program is strongly convex (LLSC) and smooth (LLS). This condition is not satisfied for bilevel programs arising from tuning hyperparameters in many machine learning algorithms. In this work, we develop a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA). We show that this algorithm achieves stationary solutions without LLSC and LLS assumptions for bilevel programs from a broad class of hyperparameter tuning applications. Our extensive experiments confirm our theoretical findings and show that the proposed VF-iDCA yields superior performance when applied to tune hyperparameters.
翻译:超参数调整的梯度优化方法保证在固定的上层变量值中,双级程序下层的高度对立和平稳(LLS),在理论上与固定式解决方案趋同。对于许多机器学习算法中调高超参数产生的双级程序,这一条件并不满足。在这项工作中,我们开发了一种基于不精确的Convex Algorithm差异(VF-iDCA)的相相近组合值函数(VF-iDCA)。我们显示,这种算法实现了固定式解决方案,没有LSC和LLS假设,没有来自广泛类超参数调控应用的双级程序。我们的广泛实验证实了我们的理论结论,并表明,在调高参数时,拟议的VF-iDCA具有优异性性。