Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree $T$ to vertices and edges of a species tree $S$. The relative timing of the last common ancestors of two extant genes (leaves of $T$) and the last common ancestors of the two species (leaves of $S$) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph. Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.
翻译:描述基因家族在一组物种中演化的演化情景包括将基因树$ T $的顶点映射到物种树$ S $的顶点和边缘。两个现存基因($ T $的叶子)的最近共同祖先和它们所在的两个物种($ S $的叶子)的最近共同祖先的相对时机表明水平基因转移(HGT)和古代重复。另一方面,同源基因对要求它们的最近共同祖先与相应的物种分化事件相一致。基因和物种分化的相对时间信息由三种彩色图捕获,这些彩色图将现有基因作为顶点并将基因所在的物种作为顶点颜色:相等分化时间(EDT)图,后分化时间(LDT)图和事前分化时间(PDT)图,它们共同形成完整图的边分区。在这里,我们给出了可以从三个图中读取的信息和禁止三元组的完整特征,并提供了一种构建能够解释图形的演化场景的多项式时间算法,前提是这样的场景存在。虽然LDT和PDT图均为cograph,但一般情况下EDT图并非如此。我们展示了每个EDT图都是完美的。虽然关于LDT和PDT图的信息对于在不包含HGT的情况下识别EDT图在一般情况下是必要的,但对于不知道可能的LDT和PDT图的情况,识别EDT图是NP完整的。最后,我们将EDT图与已经提出的处理包含水平基因转移情景的同源性的替代定义相连接。除了一个例外,相应的图表均被证明是彩色cograph。