Gene gain-loss-duplication models are commonly based on continuous-time birth-death processes. Employed in a phylogenetic context, such models have been increasingly popular in studies of gene content evolution across multiple genomes. While the applications are becoming more varied and demanding, bioinformatics methods for probabilistic inference on copy numbers (or integer-valued evolutionary characters, in general) are scarce. We describe a flexible probabilistic framework for phylogenetic gene-loss-duplication models. The framework is based on a novel elementary representation by dependent random variables with well-characterized conditional distributions: binomial, P\'olya (negative binomial), and Poisson. The corresponding graphical model yields exact numerical procedures for computing the likelihood and the posterior distribution of ancestral copy numbers. The resulting algorithms take quadratic time in the total number of copies. In addition, we show how the likelihood gradient can be computed by a linear-time algorithm.
翻译:基因增益-增益-增益模型通常基于持续时间的出生-死亡过程。 在多种基因组的基因内容演变研究中,这种模型越来越受人欢迎。虽然这些应用越来越多样化和要求更高,但复制数字(或一般而言的全值进化字符)的概率推论方法很少。我们描述的是植物遗传基因损耗-多变模型的灵活概率框架。这个框架基于一种新颖的基本表现,即依赖的随机变量,其功能性强的有条件分布为:binomial、P\'olya(负二元和Poisson)。相应的图形模型得出了计算祖传复制数字的可能性和后传分布的精确数字程序。由此产生的算法在总拷贝数中需要四倍的时间。此外,我们展示了如何用线性算法来计算梯度的可能性。