Meta-learning of shared initialization parameters has shown to be highly effective in solving few-shot learning tasks. However, extending the framework to many-shot scenarios, which may further enhance its practicality, has been relatively overlooked due to the technical difficulties of meta-learning over long chains of inner-gradient steps. In this paper, we first show that allowing the meta-learners to take a larger number of inner gradient steps better captures the structure of heterogeneous and large-scale task distributions, thus results in obtaining better initialization points. Further, in order to increase the frequency of meta-updates even with the excessively long inner-optimization trajectories, we propose to estimate the required shift of the task-specific parameters with respect to the change of the initialization parameters. By doing so, we can arbitrarily increase the frequency of meta-updates and thus greatly improve the meta-level convergence as well as the quality of the learned initializations. We validate our method on a heterogeneous set of large-scale tasks and show that the algorithm largely outperforms the previous first-order meta-learning methods in terms of both generalization performance and convergence, as well as multi-task learning and fine-tuning baselines.
翻译:共享初始化参数的元学习显示,在解决微小的学习任务方面非常有效,但是,将框架扩大到许多可能进一步提高其实用性的设想情景,由于在内梯级步骤的长链中元学习的技术困难,相对被忽略了。在本文中,我们首先表明,允许元梯级脱轨者采取更多的内梯级步骤,可以更好地捕捉差异和大规模任务分布的结构,从而获得更好的初始化点。此外,为了提高甚至与过长的内部优化轨迹相适应的元更新频率,我们提议估计在初始化参数变化方面具体任务参数的必要变化。这样,我们可以任意增加元化更新的频率,从而大大改善元级趋同以及所学初始化的质量。我们验证了我们关于大规模任务混合集的方法,并表明算法在一般化和多级化业绩和趋同方面,基本上不符合以前的第一级元元化元化元学习方法。