Bilevel optimization has arisen as a powerful tool for solving many modern machine learning problems. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations, which can be very costly in practice. In this work, we propose a novel Hessian-free bilevel algorithm, which adopts the Evolution Strategies (ES) method to approximate the response Jacobian matrix in the hypergradient of the bilevel problem, and hence fully eliminates all second-order computations. We call our algorithm as ESJ (which stands for the ES-based Jacobian method) and further extend it to the stochastic setting as ESJ-S. Theoretically, we show that both ESJ and ESJ-S are guaranteed to converge. Experimentally, we demonstrate that the proposed algorithms outperform baseline bilevel optimizers on various bilevel problems. Particularly, in our experiment on few-shot meta-learning of ResNet-12 network over the miniImageNet dataset, we show that our algorithm outperforms baseline meta-learning algorithms, while other baseline bilevel optimizers do not solve such meta-learning problems within a comparable time frame.
翻译:双层优化是解决许多现代机器学习问题的有力工具。然而,由于双层优化的嵌套结构,甚至基于梯度的方法也需要通过Jacobian 或/和Hessian-Victor的二阶衍生物近似值,这在实践中可能非常昂贵。在这项工作中,我们提议了一个小说Hessian免费双层算法,采用进化战略(ES)方法,以在双层问题高度梯度中比较Jacobian 矩阵,从而完全消除所有二阶计算。我们称我们的算法为ESJ(基于ES的Jacobian方法),并进一步将其扩展至ESJ-S的随机化设置。理论上,我们表明ESJ和ESJ-S都保证会趋同。实验性地表明,拟议的算法超越了各种双层问题的基线双层优化。特别是,在微级网络数据集上对ResNet-12网络的少量元化学习实验中,我们显示我们的算法超越了ESJ的基底基底元学习问题,而其他基线的顶级模型则解决了这种基底的元学习问题。