基于高多样性后勤倒退的遗传关系统计推论 (Statistical Inference for Genetic Relatedness Based on High-Dimensional Logistic Regression)

This paper studies the problem of statistical inference for genetic relatedness between binary traits based on individual-level genome-wide association data. Specifically, under the high-dimensional logistic regression models, we define parameters characterizing the cross-trait genetic correlation, the genetic covariance and the trait-specific genetic variance. A novel weighted debiasing method is developed for the logistic Lasso estimator and computationally efficient debiased estimators are proposed. The rates of convergence for these estimators are studied and their asymptotic normality is established under mild conditions. Moreover, we construct confidence intervals and statistical tests for these parameters, and provide theoretical justifications for the methods, including the coverage probability and expected length of the confidence intervals, as well as the size and power of the proposed tests. Numerical studies are conducted under both model generated data and simulated genetic data to show the superiority of the proposed methods. By analyzing a real data set on autoimmune diseases, we demonstrate its ability to obtain novel insights about the shared genetic architecture between ten pediatric autoimmune diseases.

翻译：本文研究基于个人层次基因组整体联系数据的二元性特征之间遗传关系的统计推断问题,具体而言,在高维后勤回归模型下,我们界定了跨三角遗传关联、遗传共变和特异基因差异的参数;为后勤激光测算仪和计算效率低偏差测算仪开发了一种新的加权偏移方法;对这些测算仪的趋同率进行了研究,并在温和条件下确定了其无症状正常性;此外,我们为这些参数建立了信任间隔和统计测试,并为这些方法提供了理论依据,包括信任间隔的覆盖概率和预期长度,以及拟议测试的规模和力量;根据模型生成的数据和模拟基因数据进行了定量研究,以显示拟议方法的优越性;通过分析关于自动免疫疾病的一套真实数据,我们展示了它获得10个子磁性自动免疫疾病之间共享遗传结构的新洞察力的能力。