高层面强有力的角基转移学习 (Robust angle-based transfer learning in high dimensions)

Transfer learning aims to improve the performance of a target model by leveraging data from related source populations. It is known to be especially helpful in cases with insufficient target data. In this paper, we study the problem of how to train a high-dimensional ridge regression model with limited target data and existing models trained in heterogeneous source populations. We consider a practical setting where only the source model parameters are accessible, instead of the individual-level source data. Under the setting with only one source model, we propose a novel flexible angle-based transfer learning (angleTL) method, which leverages the concordance between the source and the target model parameters. We show that angleTL unifies several benchmark methods by construction, including the target-only model trained using target data alone, the source model trained using the source data, and the distance-based transfer learning method that incorporates the source model to the target training by penalizing the difference between the target and source model parameters measured by the $L_2$ norm. We also provide algorithms to effectively incorporate multiple source models accounting for the fact that some source models may be more helpful than others. Our high-dimensional asymptotic analysis provides interpretations and insights regarding when a source model can be helpful to the target model, and demonstrates the superiority of angleTL over other benchmark methods. We perform extensive simulation studies to validate our theoretical conclusions and show the feasibility of applying angleTL to transfer existing genetic risk prediction models across multiple biobanks.

翻译：转让学习的目的是通过利用相关源人口的数据来改进目标模型的性能,已知这种方法在目标数据不足的情况下特别有用。在本文件中,我们研究了如何培训高维脊回归模型的问题,该模型的目标数据有限,而现有模型在不同的源人口方面受过培训。我们考虑一个实际的设置,即只有源模型参数是可以获得的,而不是个人源数据。在仅使用一个源模型的设置下,我们建议采用一种新的灵活角度转移学习(gangle TL)方法,利用源和目标模型参数之间的一致。我们表明,角度TL通过构建统一了几种基准方法,包括仅使用目标数据而培训的、仅目标型山脊回归模型、使用源数据培训的、远程转移学习方法,将源模型纳入目标培训,同时对以$L2美元标准衡量的目标和源模型参数之间的差异进行处罚。我们还提供算法,以便有效地纳入多种源模型,即某些源模型可能比其他模型更有帮助。我们的高维度模型分析提供了高端的模型,在对广泛的源数据结论上展示了高层次的理论性分析,并展示了我们现有的基准评估方法。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日