Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or ``generalizable learning of optimizers"); and (ii) the test performance of an optimizee (itself as a machine learning model), trained by the optimizer, in terms of the accuracy over unseen data (optimizee generalization, or ``learning to generalize"). While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper. We first theoretically establish an implicit connection between the local entropy and the Hessian, and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions. We then propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize, and theoretically show that such generalization ability can be learned during the L2O meta-training process and then transformed to the optimizee loss function. Extensive experiments consistently validate the effectiveness of our proposals with substantially improved generalization on multiple sophisticated L2O models and diverse optimizees. Our code is available at: https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy.
翻译:优化学习(L2O)越来越受到欢迎,它通过数据驱动方法自动设计优化器。然而,当前的L2O方法经常在至少两个方面表现出较差的推广性能:(i)将L2O学习的优化器应用于看不见的优化问题,以降低它们的损失函数值(优化器推广或“可推广性学习”);以及(ii)通过其中一个优化器训练的优化问题(本身作为一个机器学习模型)在准确性上表现良好的测试表现(优化问题推广或“学习推广”)。虽然优化器推广已被最近研究,但优化问题推广(或学习如何推广)在L2O背景下尚未得到严格研究,这是本文的目的。我们首先在理论上建立了局部熵和Hessian之间的隐含联系,从而将它们的作用作为损失函数平坦性的等效度量,在通用优化器的手工设计中进行统一。然后,我们建议将这两个度量作为平坦感知的正则化器纳入L2O框架中,以元学习优化器来学习推广,并在理论上表明这种推广能力可以在L2O元训练过程中学习,然后转化为优化问题的损失函数。广泛的实验始终验证了我们的提议的有效性,并大大提高了多个复杂的L2O模型和多样的优化问题的推广能力。我们的代码在以下网址提供:https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy。