In this work, we study the transfer learning problem under high-dimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose a transfer learning algorithm on GLM, and derive its $\ell_1/\ell_2$-estimation error bounds as well as a bound for a prediction error measure. The theoretical analysis shows that when the target and source are sufficiently close to each other, these bounds could be improved over those of the classical penalized estimator using only target data under mild conditions. When we don't know which sources to transfer, an algorithm-free transferable source detection approach is introduced to detect informative sources. The detection consistency is proved under the high-dimensional GLM transfer learning setting. We also propose an algorithm to construct confidence intervals of each coefficient component, and the corresponding theories are provided. Extensive simulations and a real-data experiment verify the effectiveness of our algorithms. We implement the proposed GLM transfer learning algorithms in a new R package glmtrans, which is available on CRAN.
翻译:在这项工作中,我们研究了高维通用线性模型(GLM)下的转移学习问题,该模型旨在通过从有用的源数据中借用信息来改进目标数据的适切性。根据哪些来源可以转让,我们提议在GLM上采用转让学习算法,并得出其$_1/\\ell_2美元估算误差界限和预测误差的框框。理论分析表明,当目标和来源相互足够接近时,这些界限可以比传统受处罚的测算仪的界限改进,只使用在温和条件下的目标数据。当我们不知道哪些来源可以转让时,将采用一种无算法可转移源检测方法来探测信息源。检测的一致性在高维的GLM传输学习设置下得到证明。我们还提议了一种算法,以构建每个系数组成部分的信任间隔,并提供相应的理论。广泛的模拟和真实数据实验可以验证我们的算法的有效性。我们实施了拟议的GLM传输算法在新的R软件包中进行学习算。