Motivation: The abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process, as this process cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Results: Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of 4-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model. Our novel inference methodology is optimization-free as it only requires the evaluation of polynomial equations, and as such, it bypasses the traversal of network space, yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate the accuracy and speed of our new method on a variety of simulated scenarios as well as in the estimation of a phylogenetic network for the genus Canis. Availability and Implementation: We implement our novel theory on an open-source publicly available Julia package PhyloDiamond.jl available at https://github.com/solislemuslab/PhyloDiamond.jl with broad applicability within the evolutionary biology community. Contact: solislemus@wisc.edu
翻译:动机:生命树中的基因流的丰富性挑战了进化可以用完全的二叉过程表示的观念,因为这个过程不能捕捉到重要的生物现实,如杂种化、内入和水平基因转移。合流基础的网络方法越来越流行,但对于大数据而言不可扩展,因为需要在网络空间中执行启发式搜索以及数值优化,这可能是 NP-难的。
结果:在此,我们介绍了一种基于代数不变量重构生物网络的新方法。虽然在生物分类学中使用代数不变量有很长的传统,但我们的工作是第一个在协调因子上定义生物分类学不变量的工作,这些因子是输入基因树中 4 个分类元素分割的频率,可以在多物种合流模型下识别一级生物分类学网络。我们的新推理方法无需优化,因为它只需要计算多项式方程,并且因此,可以绕过网络空间的遍历,得到计算速度至少比迄今为止最快的网络方法快 10 倍。我们在各种模拟场景以及在估计 Canis 属的生物分类网络时演示了我们新方法的准确性和速度。我们将我们的新理论实现在一个开源的公开 Julia 包 PhyloDiamond.jl 上, 该包广泛适用于演化生物学界。联系方式:solislemus@wisc.edu