与比率变换相比的可伸缩的贝叶斯差异时间估计 (Scalable Bayesian divergence time estimation with ratio transformations)

Divergence time estimation is crucial to provide temporal signals for dating biologically important events, from species divergence to viral transmissions in space and time. With the advent of high-throughput sequencing, recent Bayesian phylogenetic studies have analyzed hundreds to thousands of sequences. Such large-scale analyses challenge divergence time reconstruction by requiring inference on highly-correlated internal node heights that often become computationally infeasible. To overcome this limitation, we explore a ratio transformation that maps the original N - 1 internal node heights into a space of one height parameter and N - 2 ratio parameters. To make analyses scalable, we develop a collection of linear-time algorithms to compute the gradient and Jacobian-associated terms of the log-likelihood with respect to these ratios. We then apply Hamiltonian Monte Carlo sampling with the ratio transform in a Bayesian framework to learn the divergence times in four pathogenic virus phylogenies: West Nile virus, rabies virus, Lassa virus and Ebola virus. Our method both resolves a mixing issue in the West Nile virus example and improves inference efficiency by at least 5-fold for the Lassa and rabies virus examples. Our method also makes it now computationally feasible to incorporate mixed-effects molecular clock models for the Ebola virus example, confirms the findings from the original study and reveals clearer multimodal distributions of the divergence times of some clades of interest.

翻译：差异时间估计对于为从物种差异到空间和时间的病毒传播等生物重要事件提供时间信号至关重要。随着高通量测序的到来,最近巴伊西亚的植物遗传学研究已经分析了数百至数千个序列。这种大规模分析对差异时间的重建提出了挑战,要求对与高c有关的内部节点高度进行推断,而这些节点往往在计算上变得不可行。为了克服这一限制,我们探索了一种比率转换,将原来的N-1内部节点高度映射成一个高度参数和N-2比率参数的空间。为了进行可缩放的分析,我们收集了线性时间算法,以计算这些比率的梯度和与雅各布相关的日志类术语。然后,我们用汉密尔顿·蒙特卡洛取样法和贝伊斯框架的变换比率来了解四种致病病毒血源的差异时间:西尼罗病毒、狂犬病毒原病毒、拉萨病毒和埃博拉病毒。我们的方法在西尼罗病毒模型中解决了混合问题,并且改进了目前与血型病毒偏差率模型的混合分析方法,从而采用了最低5倍地将机能模型纳入了我们病毒的模型。