We propose two new statistics, V and S, to disentangle the population history of related populations from SNP frequency data. If the populations are related by a tree, we show by theoretical means as well as by simulation that the new statistics are able to identify the root of a tree correctly, in contrast to standard statistics, such as the observed matrix of F2-statistics (distances between pairs of populations). The statistic V is obtained by averaging over all SNPs (similar to standard statistics). Its expectation is the true covariance matrix of the observed population SNP frequencies, offset by a matrix with identical entries. In contrast, the statistic S is put in a Bayesian context and is obtained by averaging over pairs of SNPs, such that each SNP is only used once. It thus makes use of the joint distribution of pairs of SNPs. In addition, we provide a number of novel mathematical results about old and new statistics, and their mutual relationship.
翻译:我们提出两个新的统计数据,即V和S,以将相关人口的人口历史与SNP频率数据脱钩。如果人口与一棵树相关,我们通过理论手段和模拟表明,新的统计数据能够正确辨别树根,这与标准统计数据不同,如观察到的F2-统计矩阵(人口对口之间的距离)等。统计五是通过在所有SNP中平均得出(类似于标准统计数据)获得的。它的预期是所观察到的人口SNP频率的真正共变矩阵,并用一个具有相同条目的矩阵加以抵消。相比之下,统计数据S被置于巴耶斯背景中,通过平均高于SNP的对口获得,因此每个SNP只使用过一次。因此,它利用了双对SNP的联合分布。此外,我们提供了关于新老统计及其相互关系的一些新的数学结果。