Phylogenetics is a classical methodology in computational biology that today has become highly relevant for medical investigation of single-cell data, e.g., in the context of cancer development. The exponential size of the tree space is, unfortunately, a substantial obstacle for Bayesian phylogenetic inference using Markov chain Monte Carlo based methods since these rely on local operations. And although more recent variational inference (VI) based methods offer speed improvements, they rely on expensive auto-differentiation operations for learning the variational parameters. We propose VaiPhy, a remarkably fast VI based algorithm for approximate posterior inference in an augmented tree space. VaiPhy produces marginal log-likelihood estimates on par with the state-of-the-art methods on real data and is considerably faster since it does not require auto-differentiation. Instead, VaiPhy combines coordinate ascent update equations with two novel sampling schemes: (i) SLANTIS, a proposal distribution for tree topologies in the augmented tree space, and (ii) the JC sampler, to the best of our knowledge, the first-ever scheme for sampling branch lengths directly from the popular Jukes-Cantor model. We compare VaiPhy in terms of density estimation and runtime. Additionally, we evaluate the reproducibility of the baselines. We provide our code on GitHub: \url{https://github.com/Lagergren-Lab/VaiPhy}.
翻译:光谱系是计算生物学中的一种古典方法,今天对于单细胞数据的医学调查来说,例如癌症发展,这种方法已经变得非常重要。不幸的是,树面积的指数大小是使用马可夫链链蒙特卡洛(Monte Carlo)方法的巴伊西亚植物遗传发酵的重大障碍,因为这些方法依赖当地操作。虽然最近的变异推断方法(VI)提供了速度改进,但它们依靠昂贵的自动差异操作来学习变异参数。我们提议VaiPhy,这是在扩大的树面积中近似远地点推断的非常快速的六种基于六种算法。VaiPhy产生与真实数据中最新方法相同的边际日志相似的估算值,而且由于不需要自动区分,因此速度要快得多。相反,VaiPhyby结合了与两个新型采样方案的协调,即:(i) SLNTISIS,在扩大的树面积空间中分发树本表表,以及(ii) JC采样仪,从我们的知识中得出最佳的正数值,从Syallial-realimalimal imal imal imal imalimal imal imal imal impeal-de se-deal se-reviews.