通过主要组成部分分析或混合模型推断的人口结构评价 (Evaluation of population structure inferred by principal component analysis or the admixture model)

Principal component analysis (PCA) is commonly used in genetics to infer and visualize population structure and admixture between populations. PCA is often interpreted in a way similar to inferred admixture proportions, where it is assumed that individuals belong to one of several possible populations or are admixed between these populations. We propose a new method to assess the statistical fit of PCA (interpreted as a model spanned by the top principal components) and to show that violations of the PCA assumptions affect the fit. Our method uses the chosen top principal components to predict the genotypes. By assessing the covariance (and the correlation) of the residuals (the differences between observed and predicted genotypes), we are able to detect violation of the model assumptions. Based on simulations and genome wide human data we show that our assessment of fit can be used to guide the interpretation of the data and to pinpoint individuals that are not well represented by the chosen principal components. Our method works equally on other similar models, such as the admixture model, where the mean of the data is represented by linear matrix decomposition.

翻译：主要成分分析(PCA)通常用于遗传学,以推断和直观地显示人口结构和人口之间的混合。五氯苯甲醚通常被以类似于推断的混合比例的方式解释,即假定个人属于几种可能的人口之一,或混杂于这些人口之中。我们提出一种新的方法来评估五氯苯甲醚的统计适合性(被最高主要组成部分解释为一个模型),并表明违反五氯苯甲醚的假设会影响适应性。我们的方法使用选定的顶级主要组成部分来预测基因型。通过评估残留物(观察到的和预测的基因型的差别)的共变(和相关性),我们能够发现违反模型假设的情况。根据模拟和基因组广泛的人类数据,我们表明,对是否适合性的评估可以用来指导对数据的解释,并查明被选定的主要组成部分不能很好代表的个人。我们的方法对其他类似模型,如粘合模型,例如数据平均值以线性矩阵解剖为代表的粘合模型同样起作用。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

【经典书】量化金融导论，192页pdf，哈佛大学Stephen Blyth著作

专知会员服务

97+阅读 · 2022年4月3日

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日