Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned optimizers can both reduce the number of required training steps and improve the final test loss. However, they can be expensive to train, and once trained can be expensive to use due to computational and memory overhead for the optimizer itself. In this work, we identify and quantify the design features governing the memory, compute, and performance trade-offs for many learned and hand-designed optimizers. We further leverage our analysis to construct a learned optimizer that is both faster and more memory efficient than previous work. Our model and training code are open source.
翻译:优化在开发机器学习系统方面发挥着昂贵和关键的作用。在学习优化中,用灵活的参数取代了通常使用的手工设计优化器(例如亚当或SGD)的少数高参数。这些功能的参数随后得到优化,这样,由此而来的学习优化器就能将选定类型的模型的目标损失减至最小。学习优化器既可以减少所需培训步骤的数量,也可以改进最后测试损失。但是,培训费用可能很高,一旦经过培训后,由于优化器本身的计算和记忆管理而使用费用昂贵。在这项工作中,我们为许多学习和手工设计的优化器确定和量化关于记忆、计算和性能权衡的设计特点。我们进一步利用我们的分析来建立一个学习优化器,比以前的工作更快和更具记忆效率。我们的模型和培训代码是开源的。