Inference of network-like evolutionary relationships between species from genomic data must address the interwoven signals from both gene flow and incomplete lineage sorting. The heavy computational demands of standard approaches to this problem severely limit the size of datasets that may be analyzed, in both the number of species and the number of genetic loci. Here we provide a theoretical pointer to more efficient methods, by showing that logDet distances computed from genomic-scale sequences retain sufficient information to recover network relationships in the level-1 ultrametric case. This result is obtained under the Network Multispecies Coalescent model combined with a mixture of General Time-Reversible sequence evolution models across individual gene trees, but does not depend on partitioning sequences by genes. Thus under standard stochastic models statistically justifiable inference of network relationships from sequences can be accomplished without consideration of individual genes or gene trees.
翻译:基因组数据中物种之间类似网络的进化关系的推论必须针对基因流动和不完全的线系分类的相互交织信号。 对这一问题的标准方法的繁琐计算要求严重限制了可分析的数据集的规模,在物种数量和遗传原体数量方面都是如此。 在这里,我们提供了一个理论指针,以提高效率的方法,表明从基因组尺度序列中计算的对数距离保留了足够信息,以恢复一级至超度情况下的网络关系。这一结果是在网络多物种-煤白模型下取得的,结合了不同基因树的通用可变时间序列演化模型的混合,但并不取决于基因的分离序列。因此,在标准的基因分析模型下,可以在不考虑个别基因或基因树的情况下实现网络关系从序列中推导出在统计上合理的统计学上合理的推论。