双向双向: 线性回归任务之间转移学习时的一般错误 (Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks)

We study the transfer learning process between two linear regression problems. An important and timely special case is when the regressors are overparameterized and perfectly interpolate their training data. We examine a parameter transfer mechanism whereby a subset of the parameters of the target task solution are constrained to the values learned for a related source task. We analytically characterize the generalization error of the target task in terms of the salient factors in the transfer learning architecture, i.e., the number of examples available, the number of (free) parameters in each of the tasks, the number of parameters transferred from the source to target task, and the relation between the two tasks. Our non-asymptotic analysis shows that the generalization error of the target task follows a two-dimensional double descent trend (with respect to the number of free parameters in each of the tasks) that is controlled by the transfer learning factors. Our analysis points to specific cases where the transfer of parameters is beneficial as a substitute for extra overparameterization (i.e., additional free parameters in the target task). Specifically, we show that the usefulness of a transfer learning setting is fragile and depends on a delicate interplay among the set of transferred parameters, the relation between the tasks, and the true solution. We also demonstrate that overparameterized transfer learning is not necessarily more beneficial when the source task is closer or identical to the target task.

翻译：我们研究的是两个线性回归问题的转移学习过程。一个重要而及时的特殊情况是,递减者过分分解,完全将其培训数据相互调试。我们研究一个参数转移机制,根据这一机制,目标任务解决方案的一组参数受相关源任务所学值的限制;我们分析地从转移学习结构的突出因素来分析目标任务的一般错误,即现有实例的数量、每项任务中的(免费)参数数目、从源到目标任务的参数数目以及两项任务之间的关系。我们的非抽取分析表明,目标任务的一般错误遵循由转移学习因素控制的二维双向下降趋势(即每项任务中自由参数的数目)。我们的分析指出,转让参数有助于替代超标化(即目标任务中的额外自由参数)的具体案例数目、从源到目标任务之间的参数数目以及两个任务之间的关系。我们的非抽查分析表明,目标任务的一般性错误遵循了由转移因素控制的二维双向双向下降趋势(即每个任务中每个任务的自由参数的数目)。我们的分析指出,参数的转移有利于替代超标(即目标任务的额外自由参数)的效用是脆弱的,而且取决于转移任务之间的微妙联系,我们所转让的参数是更接近的。

相关内容

泛化误差

关注 107

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日