In recent years, by leveraging more data, computation, and diverse tasks, learned optimizers have achieved remarkable success in supervised learning optimization, outperforming classical hand-designed optimizers. However, in practice, these learned optimizers fail to generalize to reinforcement learning tasks due to unstable and complex loss landscapes. Moreover, neither hand-designed optimizers nor learned optimizers have been specifically designed to address the unique optimization properties in reinforcement learning. In this work, we take a data-driven approach to learn to optimize for reinforcement learning using meta-learning. We introduce a novel optimizer structure that significantly improves the training efficiency of learned optimizers, making it possible to learn an optimizer for reinforcement learning from scratch. Although trained in toy tasks, our learned optimizer demonstrates its generalization ability to unseen complex tasks. Finally, we design a set of small gridworlds to train the first general-purpose optimizer for reinforcement learning.
翻译:近年来,通过利用更多的数据、计算和多种任务,学习到的优化在有监督的学习优化、优于经典手工设计的优化方面取得了显著成功。然而,在实践中,这些学习到的优化由于不稳定和复杂的损失场景而未能推广到强化学习任务。此外,无论是手工设计的优化器还是学习到的优化器,都没有专门设计出解决强化学习中独特的优化特性的优化器和优化器。在这项工作中,我们采用了一种数据驱动方法,学习如何利用元学习优化强化学习。我们引入了一种新型优化器结构,大大提高了有知识的优化器的培训效率,从而有可能学习到从零到零的强化学习的优化器。尽管我们在玩具任务方面受过了培训,但我们所学到的优化器展示了在无形复杂任务方面的普遍化能力。最后,我们设计了一套小网格世界来培训第一个通用的强化学习优化器。