This paper studies the problem of statistical inference for genetic relatedness between binary traits based on individual-level genome-wide association data. Specifically, under the high-dimensional logistic regression model, we define parameters characterizing the cross-trait genetic correlation, the genetic covariance and the trait-specific genetic variance. A novel weighted debiasing method is developed for the logistic Lasso estimator and computationally efficient debiased estimators are proposed. The rates of convergence for these estimators are studied and their asymptotic normality is established under mild conditions. Moreover, we construct confidence intervals and statistical tests for these parameters, and provide theoretical justifications for the methods, including the coverage probability and expected length of the confidence intervals, as well as the size and power of the proposed tests. Numerical studies are conducted under both model generated data and simulated genetic data to show the superiority of the proposed methods and their applicability to the analysis of real genetic data. Finally, by analyzing a real data set on autoimmune diseases, we demonstrate the ability to obtain novel insights about the shared genetic architecture between ten pediatric autoimmune diseases.
翻译:本文研究基于个人层次基因组整体联系数据的二进制特征之间遗传关系的统计推断问题。具体地说,在高维后勤回归模型下,我们界定了跨三角遗传关联、遗传共变和特异基因差异的参数。为后勤激光测算器和计算效率低偏差测算器开发了一种新的加权偏移方法。对这些测算器的趋同率进行了研究,并在温和条件下确定了其无症状正常性。此外,我们为这些参数建立了信任间隔和统计测试,并为这些参数提供了理论依据,包括信任间隔的覆盖概率和预期长度,以及拟议测试的规模和力量。根据模型生成的数据和模拟基因数据进行了数值研究,以显示拟议方法的优越性及其对真实基因数据分析的可适用性。最后,通过分析一套关于自动免疫系统疾病的真实数据集,我们展示了获得10微离子自闭式疾病之间共享遗传结构的新认识的能力。