Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning. However, the approach suffers from time and memory complexity proportional to the length $r$ of its inner optimization loop, which has led to several modifications being proposed. One such modification is \textit{first-order} BLO (FO-BLO) which approximates outer-level gradients by zeroing out second derivative terms, yielding significant speed gains and requiring only constant memory as $r$ varies. Despite FO-BLO's popularity, there is a lack of theoretical understanding of its convergence properties. We make progress by demonstrating a rich family of examples where FO-BLO-based stochastic optimization does not converge to a stationary point of the BLO objective. We address this concern by proposing a new FO-BLO-based unbiased estimate of outer-level gradients, enabling us to theoretically guarantee this convergence, with no harm to memory and expected time complexity. Our findings are supported by experimental results on Omniglot and Mini-ImageNet, popular few-shot meta-learning benchmarks.
翻译:双层优化(BLO)是一种广受欢迎的方法,有许多应用,包括超参数优化、神经结构搜索、对抗性稳健性和模型-不可知元学习,然而,该方法有时间和记忆的复杂性,与其内部优化循环的长度成正比,这导致提出了若干修改。这种修改是\ textit{第一顺序}BLO(FO-BLO),它通过将第二衍生术语除以零来接近外层梯度,产生巨大的速度增益,仅需要以不同的美元来保持记忆。尽管FO-BLO的受欢迎程度,但对其趋同特性缺乏理论上的理解。我们通过展示以FO-BLO为基础的随机优化不能与BLO目标的固定点相趋同的丰富范例系列,我们通过提出一个新的基于FO-BLO的对外层梯度的不偏袒性估计来解决这一关切,使我们能够从理论上保证这种趋同,而不影响记忆和预期的时间复杂性。我们的调查结果得到Omniglott和MiniImageNet的实验性结果的支持。