We consider the problem of identifying jointly the ancestral sequence, the phylogeny and the parameters in models of DNA sequence evolution with insertion and deletion (indel). Under the classical TKF91 model of sequence evolution, we obtained explicit formulas for the root sequence, the pairwise distances of leaf sequences, as well as the scaled rates of indel and substitution in terms of the distribution of the leaf sequences of an arbitrary phylogeny. These explicit formulas not only strengthen existing invertibility results and work for phylogeny that are not necessarily ultrametric, but also lead to new estimators with less assumption compared with the existing literature. Our simulation study demonstrates that these estimators are statistically consistent as the number of independent samples tends to infinity.
翻译:暂无翻译