An overarching goal in machine learning is to build a generalizable model with few samples. To this end, overparameterization has been the subject of immense interest to explain the generalization ability of deep nets even when the size of the dataset is smaller than that of the model. While the prior literature focuses on the classical supervised setting, this paper aims to demystify overparameterization for meta-learning. Here we have a sequence of linear-regression tasks and we ask: (1) Given earlier tasks, what is the optimal linear representation of features for a new downstream task? and (2) How many samples do we need to build this representation? This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions. Specifically, for (1), we first show that learning the optimal representation coincides with the problem of designing a task-aware regularization to promote inductive bias. We leverage this inductive bias to explain how the downstream task actually benefits from overparameterization, in contrast to prior works on few-shot learning. For (2), we develop a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error. We then integrate these findings to obtain an overall performance guarantee for our meta-learning algorithm. Numerical experiments on real and synthetic data verify our insights on overparameterized meta-learning.
翻译:机器学习的首要目标是用少量样本构建一个普遍适用的模式。 为此, 过度参数化已经成为人们极有兴趣解释深网一般化能力的主题, 即使数据集的规模小于模型。 虽然先前的文献侧重于古典监督环境, 但本文旨在解开超分法的神秘化, 用于元学习。 我们在这里有一系列线性倒退任务, 我们问:(1) 鉴于先前的任务, 新的下游任务的最佳线性表示方式是什么? 以及 (2) 我们需要多少样本来建立这种代表方式? 这项工作表明, 令人惊讶的是, 过度参数化是对这些基本元学习问题的自然答案。 具体地说, (1) 我们首先表明, 最佳代表方式的学习与设计任务性调整以促进感性偏差的问题相吻合。 我们利用这种细微偏差的偏差来解释下游任务如何从过度量化中获益, 与之前的微小的学习相比 。 ( 2) 我们开发了一种理论来解释特征差异性可隐含意地帮助降低样本复杂性, 成为这些基本元学习问题的自然答案。 (我们) 我们首先了解了这些模型的精确度, 校验测结果。