It is now well documented that genetic covariance between functionally related traits leads to an uneven distribution of genetic variation across multivariate trait combinations, and possibly a large part of phenotype-space that is inaccessible to evolution. How the size of this nearly-null genetic space translates to the broader phenome level is unknown. High dimensional phenotype data to address these questions are now within reach, however, incorporating these data into genetic analyses remains a challenge. Multi-trait genetic analyses, of more than a handful of traits, are slow and often fail to converge when fit with REML. This makes it challenging to estimate the genetic covariance ($\mathbf{G}$) underlying thousands of traits, let alone study its properties. We present a previously proposed REML algorithm that is feasible for high dimensional genetic studies in the specific setting of a balanced nested half-sib design, common of quantitative genetics. We show that it substantially outperforms other common approaches when the number of traits is large, and we use it to investigate the bias in estimated eigenvalues of $\mathbf{G}$ and the size of the nearly-null genetic subspace. We show that the high-dimensional biases observed are qualitatively similar to those substantiated by asymptotic approximation in a simpler setting of a sample covariance matrix based on i.i.d. vector observation, and that interpreting the estimated size of the nearly-null genetic subspace requires considerable caution in high-dimensional studies of genetic variation. Our results provide the foundation for future research characterizing the asymptotic approximation of estimated genetic eigenvalues, and a statistical null distribution for phenome-wide studies of genetic variation.
翻译:功能相关特性之间的遗传共变,导致不同多变特性组合之间的遗传变异分布不均,而且有可能是无法进化的一大部分苯型空间。这个近核遗传空间的大小如何转化到更宽的苯基水平尚不为人知。用于解决这些问题的高维苯型数据现在还处于可达范围,但是,将这些数据纳入基因分析仍是一个挑战。多轨遗传分析,其特性多于少数,缓慢且往往无法在与REML相适应时趋同。这使得难以估算基因共变异性({mathb{G{G}$)的基因类型空间,因此难以估算数千种遗传特性,更不用说研究其性质了。我们提出了先前提议的REML算法,在平衡的半子细胞设计的具体环境中进行高维的遗传研究是可行的,但将这些数据纳入基因分析的常见。当特性数量众多时,多轨遗传遗传学的遗传学分析非常优于其他常见方法,当与REML时,我们使用它来调查在估算的遗传变异性值中的遗传变异性,而我们利用它来研究的精确值估计的数值,因为我们所观测到的亚基基础的亚基基础的基值几乎要显示了。