If the trend of learned components eventually outperforming their hand-crafted version continues, learned optimizers will eventually outperform hand-crafted optimizers like SGD or Adam. Even if learned optimizers (L2Os) eventually outpace hand-crafted ones in practice however, they are still not provably convergent and might fail out of distribution. These are the questions addressed here. Currently, learned optimizers frequently outperform generic hand-crafted optimizers (such as gradient descent) at the beginning of learning but they generally plateau after some time while the generic algorithms continue to make progress and often overtake the learned algorithm as Aesop's tortoise which overtakes the hare and are not. L2Os also still have a difficult time generalizing out of distribution. (Heaton et al., 2020) proposed Safeguarded L2O (GL2O) which can take a learned optimizer and safeguard it with a generic learning algorithm so that by conditionally switching between the two, the resulting algorithm is provably convergent. We propose a new class of Safeguarded L2O, called Loss-Guarded L2O (LGL2O), which is both conceptually simpler and computationally less expensive. The guarding mechanism decides solely based on the expected future loss value of both optimizers. Furthermore, we show theoretical proof of LGL2O's convergence guarantee and empirical results comparing to GL2O and other baselines showing that it combines the best of both L2O and SGD and and in practice converges much better than GL2O.
翻译:如果学习到的部件的趋势最终超过手工制作的版本, 学习到的优化将最终超过SGD 或 Adam 等手工制作的优化。 即使学习到的优化(L2Os)最终在实践上超过了手工制作的优化(L2Os), 但它们实际上仍然不能令人看似趋同, 并且可能无法从分发中脱颖而出。 这就是这里所处理的问题。 目前, 学习到的优化往往超过学习开始时的通用手工制作的优化(如坡度下降), 但它们一般在一段时间后会稳定下来, 而通用算法则继续取得进展, 并常常将学习到的精通的精通的精通精通的优化(Aesop) 算法作为Aesop 的精通的精通精通的精通精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通。 我们的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通的精通