In this paper, we propose double machine learning procedures to estimate genetic relatedness between two traits in a model-free framework. Most existing methods require specifying certain parametric models involving the traits and genetic variants. However, the bias due to model mis-specification may yield misleading statistical results. Moreover, the semiparametric efficient bounds for estimators of genetic relatedness are still lacking. In this paper, we develop semi-parametric efficient and model-free estimators and construct valid confidence intervals for two important measures of genetic relatedness: genetic covariance and genetic correlation, allowing both continuous and discrete responses. Based on the derived efficient influence functions of genetic relatedness, we propose a consistent estimator of the genetic covariance as long as one of genetic values is consistently estimated. The data of two traits may be collected from the same group or different groups of individuals. Various numerical studies are performed to illustrate our introduced procedures. We also apply proposed procedures to analyze Carworth Farms White mice genome-wide association study data.
翻译:在本文中,我们提出了双机器学习过程,以模型自由框架估计两个特征之间的遗传相关性。 大多数现有方法需要指定涉及特征和基因变异的某些参数模型。然而,由于模型错误规范导致的偏差可能会产生误导性的统计结果。此外,遗传相关性估计器的半参数有效界仍然缺失。 在本文中,我们开发了半参数有效和模型自由的估计器,并针对两个重要的遗传相关性度量构建了有效的置信区间:遗传协方差和遗传相关性,允许连续和离散响应。 基于遗传相关性的有效影响函数,我们提出了一个一致的遗传协方差估计器,只要一个遗传值被一致地估计。两个特征的数据可以从同一组或不同组的个体中收集。 进行各种数值研究以说明我们介绍的程序。我们还将所提出的程序应用于分析Carworth农场白老鼠全基因组关联研究数据。