Motivated by empirical arguments that are well-known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate SNP in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification to this technique in order to trade-off velocity against veracity. Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods. We consider two sources of confounding factors, population stratification and environmental confounding factors, and study how different methods that are commonly used in practice trade-off these two confounding factors differently.
翻译:在全基因组协会研究文献中广为人知的经验论的推动下,我们研究了适用于GWAS的线性混合模型(LMMs)的统计特性。首先,我们研究了LMMs对将候选SNP纳入亲属关系矩阵的敏感性,在实践中,通常这样做是为了加速计算。我们的结果揭示了将候选SNP纳入导致的错误大小,为这一技术提供了依据,以便进行交易速度与真实性相抵。第二,我们研究了混合模型如何能够纠正GWAS的混淆分子,这被广泛接受为LMMs对传统方法的优势。我们考虑了混淆因素的两个来源,即人口分层和环境分解因素,并研究了在实际交易中通常使用的不同方法如何不同。