超拟合元学习的泛化性能的理论特征化 (Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning)

Meta-learning has arisen as a successful method for improving training performance by training over many similar tasks, especially with deep neural networks (DNNs). However, the theoretical understanding of when and why overparameterized models such as DNNs can generalize well in meta-learning is still limited. As an initial step towards addressing this challenge, this paper studies the generalization performance of overfitted meta-learning under a linear regression model with Gaussian features. In contrast to a few recent studies along the same line, our framework allows the number of model parameters to be arbitrarily larger than the number of features in the ground truth signal, and hence naturally captures the overparameterized regime in practical deep meta-learning. We show that the overfitted min $\ell_2$-norm solution of model-agnostic meta-learning (MAML) can be beneficial, which is similar to the recent remarkable findings on ``benign overfitting'' and ``double descent'' phenomenon in the classical (single-task) linear regression. However, due to the uniqueness of meta-learning such as task-specific gradient descent inner training and the diversity/fluctuation of the ground-truth signals among training tasks, we find new and interesting properties that do not exist in single-task linear regression. We first provide a high-probability upper bound (under reasonable tightness) on the generalization error, where certain terms decrease when the number of features increases. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large. Under this circumstance, we show that the overfitted min $\ell_2$-norm solution can achieve an even lower generalization error than the underparameterized solution.

翻译：元学习已经成为一种成功的方法，通过在许多类似任务上进行训练，特别是使用深度神经网络（DNN），来提高训练性能。然而，对于超参数化模型（如DNN）何时以及为何可以在元学习中很好地进行泛化的理论理解仍然有限。作为解决这一挑战的初步步骤，本文研究了在具有高斯特征的线性回归模型下，超拟合元学习的泛化性能。与最近的一些研究相比，我们的框架允许模型参数数量任意大于基本事实信号的特征数量，因此自然地捕获了实践中的深度元学习的超参数化区域。我们表明，模型不可知元学习（MAML）的超拟合$\ell_2$-范数解可以是有利的，这类似于经典（单任务）线性回归中的“健康超拟合”和“双下降”现象的最新发现。但是，由于元学习的独特性质，如任务特定的梯度下降内部训练以及每个训练任务的基础事实信号的多样性/波动性，我们发现一些在单任务线性回归中不存在的新的有趣特性。我们首先提供了一种高概率上限（在合理的紧度下）泛化误差，在该上限中某些项随着特征数量的增加而减小。我们的分析表明，当噪声和每个训练任务的基本事实信号的多样性/波动性较大时，健康超拟合更为显著且更易观察。在这种情况下，我们表明超拟合$\ell_2$-范数最小解可以实现比欠参数解更低的泛化误差。