Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem that often exhibits ill-conditioning, and myopic meta-objectives. We propose an algorithm that tackles these issues by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. Focusing on meta-learning with gradients, we establish conditions that guarantee performance improvements and show that the improvement is related to the target distance. Thus, by controlling curvature, the distance measure can be used to ease meta-optimization, for instance by reducing ill-conditioning. Further, the bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The algorithm is versatile and easy to implement. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities by meta-learning efficient exploration in a Q-learning agent.
翻译:元学习使人工智能能够通过学习学习来提高效率。 解锁这一潜力需要克服一个挑战性的元优化问题, 常常表现出不适应和短视的元目标。 我们建议一种算法,通过让元脱皮器自学来解决这些问题。 算法第一靴套将一个来自元脱皮器的目标设为陷阱, 然后将元脱皮器的距离通过选择的( 假冒) 度量来最小化, 以提高其效率 。 以梯度为主的元学习为焦点, 我们建立保证业绩改进的条件, 并显示改进与目标距离相关。 因此, 通过控制曲线, 远程测量可以用来缓解元脱皮, 例如通过减少不适应性调整。 此外, 制靴机制可以扩展有效的元学习视野, 而无需通过所有更新进行反向调整。 算法既灵活又容易实施。 我们为Atari ALE 基准的无型代理实现了一种新的状态, 改进了在微调的学习中, 在微调的代理中改进了MAL, 展示了我们如何打开新的可能性。