We consider M SNP data from N individuals who are an admixture of K unknown ancient populations. Let $\Pi_{si}$ be the frequency of the reference allele of individual i at SNP s. So the number of reference alleles at SNP s for a diploid individual is binomially distributed with parameters 2 and $\Pi_{si}$. We suppose $\Pi_{si}=\sum_{k=1}^KF_{sk}Q_{ki}$, where $F_{sk}$ is the allele frequency of SNP s in population k and $Q_{ki}$ is the proportion of population k in the ancestry of individual i. I am interested in the identifiability of F and Q, up to a relabelling of the ancient populations. Under what conditions, when $\Pi =F^1Q^1=F^2Q^2$ are $F^1$ and $F^2$ and $Q^1$ and $Q^2$ equal? I show that the anchor condition (Cabreros and Storey, 2019) on one matrix together with an independence condition on the other matrix is sufficient for identifiability. I will argue that the proof of the necessary condition in Cabreros and Storey, 2019 is incorrect, and I will provide a correct proof, which in addition does not require knowledge of the number of ancestral populations. I will also provide abstract necessary and sufficient conditions for identifiability. I will show that one cannot deviate substantially from the anchor condition without losing identifiability. Finally, I show necessary and sufficient conditions for identifiability for the non-admixed case.
翻译:我们认为,M SNP数据来自不明的古代人口。 $\\ Pi ⁇ si} 美元是个人在SNPs的参考频率。 因此,SNPs的低层个人在SNPs的参考源数是二进制分布的,有2和1Pi ⁇ si}美元。 我们认为,美元是1美元和2美元,美元是SNP在人口K和$ki}中的所有高频。 美元是个人在SNPs的参考源次的频率。 我对F和Q在SNPs的可识别度的参考源次数表示兴趣,直到古代人口的重新贴标签。 在什么条件下,$=Pi=F1 ⁇ 1=F ⁇ k=1美元,2美元和1美元和2美元相等? 我表明,固定状态(Cabreros and Storey, 2019)是个人在个人祖先的祖先的继承期比例中的比例。 我在一个矩阵上有足够的不可靠性, 也无法证明 能够证明 20的准确性。