This paper is motivated by the joint analysis of genetic, imaging, and clinical (GIC) data collected in many large-scale biomedical studies, such as the UK Biobank study and the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. We propose a regression framework based on partially functional linear regression models to map high-dimensional GIC-related pathways for phenotypes of interest. We develop a joint model selection and estimation procedure by embedding imaging data in the reproducing kernel Hilbert space and imposing the $\ell_0$ penalty for the coefficients of scalar variables. We systematically investigate the theoretical properties of scalar and functional efficient estimators, including non-asymptotic error bound, minimax error bound, and asymptotic normality. We apply the proposed method to the ADNI dataset to identify important features from several millions of genetic polymorphisms and study the effects of a certain set of informative genetic variants and the hippocampus surface on thirteen cognitive variables.
翻译:本文的动因是对许多大规模生物医学研究,例如英国生物银行研究和阿尔茨海默氏病神经造影倡议(ADNI)研究中收集的遗传、成像和临床(GIC)数据进行联合分析。我们提议了一个基于部分功能性线性回归模型的回归框架,以绘制与人种类型有关的高维GIC路径图。我们开发了一个联合模型选择和估计程序,将成像数据嵌入再生产核心Hilbert空间,并对卡路里变量的系数处以$@ell_0美元罚款。我们系统地调查了标价和功能性高效估测器的理论特性,包括非随机误差、迷你麦克斯误差和无药性正常性。我们将拟议方法应用于ADNI数据集,以确定数以百万计的遗传多元形态的重要特征,并研究某些信息遗传变异体和河马峰表面对十三种认知变体的影响。