Covariate shift and outcome model heterogeneity are two prominent challenges in leveraging external sources to improve risk modeling for underrepresented cohorts in paucity of accurate labels. We consider the transfer learning problem targeting some unlabeled minority sample encountering (i) covariate shift to the labeled source sample collected on a different cohort; and (ii) outcome model heterogeneity with some majority sample informative to the targeted minority model. In this scenario, we develop a novel model-assisted and knowledge-guided transfer learning targeting underrepresented population (MAKEUP) approach for high-dimensional regression models. Our MAKEUP approach includes a model-assisted debiasing step in response to the covariate shift, accompanied by a knowledge-guided sparsifying procedure leveraging the majority data to enhance learning on the minority group. We also develop a model selection method to avoid negative knowledge transfer that can work in the absence of gold standard labels on the target sample. Theoretical analyses show that MAKEUP provides efficient estimation for the target model on the minority group. It maintains robustness to the high complexity and misspecification of the nuisance models used for covariate shift correction, as well as adaptivity to the model heterogeneity and potential negative transfer between the majority and minority groups. Numerical studies demonstrate similar advantages in finite sample settings over existing approaches. We also illustrate our approach through a real-world application about the transfer learning of Type II diabetes genetic risk models on some underrepresented ancestry group.
翻译:暂无翻译