Due to its empirical success on few shot classification and reinforcement learning, meta-learning recently received a lot of interest. Meta-learning leverages data from previous tasks to quickly learn a new task, despite limited data. In particular, model agnostic methods look for initialisation points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods learn a good shared representation during training, there is no strong theoretical evidence of such behavior. More importantly, it is unclear whether these methods truly are model agnostic, i.e., whether they still learn a shared structure despite architecture misspecifications. To fill this gap, this work shows in the limit of an infinite number of tasks that first order ANIL with a linear two-layer network architecture successfully learns a linear shared representation. Moreover, this result holds despite misspecifications: having a large width with respect to the hidden dimension of the shared representation does not harm the algorithm performance. The learnt parameters then allow to get a small test loss after a single gradient step on any new task. Overall this illustrates how well model agnostic methods can adapt to any (unknown) model structure.
翻译:虽然数据有限,但元学习利用先前任务的数据快速学习新任务。 特别是模型不可知方法寻找初始点,梯度下降迅速适应任何新任务。 虽然经验表明,这些方法在培训期间学会了良好的共享代表性,但没有关于这种行为的强有力的理论证据。 更重要的是,尚不清楚这些方法是否真正是模型不可知性,即它们是否仍然在结构错误区分的情况下学习一个共享结构。 为了填补这一空白,这项工作显示,第一顺序的ANIL和线性双层网络结构能够成功学习线性共享代表的无限数量的任务有限度。 此外,尽管有错误的区分:对共享代表的隐藏层面有大宽度,并不损害算法性。 学到的参数允许在任何新任务上单梯度一步后获得一个小的测试损失。 总体来说,这说明模型的模型方法如何适应任何( 未知的) 模型结构。</s>