While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.
翻译:虽然深层次学习模型取代了许多领域的手工设计特征,但这些模型仍然用手工设计的优化器进行训练。 在这项工作中,我们利用同样的规模化方法,成功深层次学习多功能优化器。我们培训了深层学习优化器,而深层学习优化器本身就是一个吸收梯度和产出参数更新的小型神经网络。经过大约4 000个TPU月的模拟培训,计算了多种多样的优化任务,我们的优化器不仅展示了令人信服的业绩,而且以有趣的和出人意料的方式优化了。它不需要超分仪调整,而是自动适应正在优化的问题的具体特点。我们打开了我们所学的优化器、元培训代码、相关的培训和测试数据,以及具有Velo-code.githubio基线的广泛优化基准套件。