Model agnostic meta-learning (MAML) is a popular state-of-the-art meta-learning algorithm that provides good weight initialization of a model given a variety of learning tasks. The model initialized by provided weight can be fine-tuned to an unseen task despite only using a small amount of samples and within a few adaptation steps. MAML is simple and versatile but requires costly learning rate tuning and careful design of the task distribution which affects its scalability and generalization. This paper proposes a more robust MAML based on an adaptive learning scheme and a prioritization task buffer(PTB) referred to as Robust MAML (RMAML) for improving scalability of training process and alleviating the problem of distribution mismatch. RMAML uses gradient-based hyper-parameter optimization to automatically find the optimal learning rate and uses the PTB to gradually adjust train-ing task distribution toward testing task distribution over the course of training. Experimental results on meta reinforcement learning environments demonstrate a substantial performance gain as well as being less sensitive to hyper-parameter choice and robust to distribution mismatch.
翻译:模型不可知元学习(MAML)是一种流行的先进元学习算法,它为一种模式提供了良好的加权初始化,具有各种学习任务,提供重量的模型可以微调适应一项看不见的任务,尽管只是使用少量的样本,而且只是在若干调整步骤之内。MAML是简单和多功能的,但需要昂贵的学习率调整和仔细设计任务分配,从而影响其可缩放性和概括性。本文提议基于适应性学习计划和优先排序任务缓冲(PTB),称为Robust MAML(RMAML),以提高培训进程的可扩展性并减轻分配不匹配问题。RMAML利用基于梯度的超参数优化自动找到最佳学习率,并利用PTB逐步调整培训任务分配,以测试培训课程的分布。元强化学习环境的实验结果显示,业绩获得很大的提高,对超参数选择不敏感,而且对分配不匹配性强。