Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from a high computation complexity in nonconvex bi-level optimization. In this work, we study a proximal gradient-type algorithm that adopts the approximate implicit differentiation (AID) scheme for nonconvex bi-level optimization with possibly nonconvex and nonsmooth regularizers. In particular, the algorithm applies the Nesterov's momentum to accelerate the computation of the implicit gradient involved in AID. We provide a comprehensive analysis of the global convergence properties of this algorithm through identifying its intrinsic potential function. In particular, we formally establish the convergence of the model parameters to a critical point of the bi-level problem, and obtain an improved computation complexity $\mathcal{O}(\kappa^{3.5}\epsilon^{-2})$ over the state-of-the-art result. Moreover, we analyze the asymptotic convergence rates of this algorithm under a class of local nonconvex geometries characterized by a {\L}ojasiewicz-type gradient inequality. Experiment on hyper-parameter optimization demonstrates the effectiveness of our algorithm.
翻译:许多重要的机器学习应用程序都包含常规化的非康维克斯双层优化。 然而, 现有的基于梯度的双级优化算法无法处理非康维克斯或非色调正规化者, 并且它们在非康维克斯双级优化中存在高度的计算复杂性。 在这项工作中, 我们研究一种准度梯度型算法, 采用非康维克斯双级优化的近似隐含差异化( AID) 计划, 可能使用非康维克斯和非非色调优化者。 特别是, 算法运用了Nesterov的动力, 加速计算AID中隐含的梯度。 我们通过确定其内在的潜在功能, 对这种算法的全球趋同性进行了全面分析。 特别是, 我们正式确定模型参数与双级问题的关键点的趋同, 并获得一个更好的计算复杂性 $mathcal{O} (kapappapaç 3.5 ⁇ - epsilon ⁇ 2} 。 此外, 我们分析了Nesterovov的这一算法在本地非康维克斯级变平级地平级的地平级化法中, 我们的高级地平级的极价化分析中, 的自我化分析这种算法系的系统效率。