This paper reviews gradient-based techniques to solve bilevel optimization problems. Bilevel optimization is a general way to frame the learning of systems that are implicitly defined through a quantity that they minimize. This characterization can be applied to neural networks, optimizers, algorithmic solvers and even physical systems, and allows for greater modeling flexibility compared to an explicit definition of such systems. Here we focus on gradient-based approaches that solve such problems. We distinguish them in two categories: those rooted in implicit differentiation, and those that leverage the equilibrium propagation theorem. We present the mathematical foundations that are behind such methods, introduce the gradient-estimation algorithms in detail and compare the competitive advantages of the different approaches.
翻译:本文审视了解决双级优化问题的基于梯度的技术。 双级优化是一种一般的方法,用以构建对以其最小化的数量暗含定义的系统的学习。 这种定性可适用于神经网络、优化器、算法求解器甚至物理系统,并允许与对此类系统的明确定义相比,具有更大的模型灵活性。 我们在这里侧重于解决此类问题的基于梯度的方法。 我们将其分为两类: 以隐含差异为基础的方法, 以及利用均衡传播理论的方法。 我们展示了这些方法背后的数学基础, 详细引入了梯度估算算法, 并比较了不同方法的竞争优势。