Representation learning has been widely studied in the context of meta-learning, enabling rapid learning of new tasks through shared representations. Recent works such as MAML have explored using fine-tuning-based metrics, which measure the ease by which fine-tuning can achieve good performance, as proxies for obtaining representations. We present a theoretical framework for analyzing representations derived from a MAML-like algorithm, assuming the available tasks use approximately the same underlying representation. We then provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure. The upper bound applies to general function classes, which we demonstrate by instantiating the guarantees of our framework in the logistic regression and neural network settings. In contrast, we establish the existence of settings where any algorithm, using a representation trained with no consideration for task-specific fine-tuning, performs as well as a learner with no access to source tasks in the worst case. This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
翻译:在元学习方面,对代表性学习进行了广泛研究,从而通过共同的表述方式迅速学习新的任务。最近的一些工作,如MAML, 探索了使用微调为基础的衡量标准,以测量微调能够取得良好业绩的容易程度,作为获得代表性的代理人。我们提出了一个理论框架,用于分析来自类似MAML的算法的表述,假设现有任务使用基本代表方式大致相同。然后,我们对通过梯度下降进行微调发现的最佳预测者提供了风险界限,表明算法可以合理地利用共享的结构。上限适用于一般功能类别,我们通过在后勤回归和神经网络设置中即时落实我们框架的保障来证明这一点。相反,我们建立了这样一种环境,即任何算法,使用经过训练的不考虑特定任务微调的表述方式,发挥在最坏的情况下无法接触源任务的知识者的作用。这种分离的结果强调了微调方法的好处,如MAML, 超过“冻结代表性”方法,在少数的学习中的目标。