Horizontal gene transfer events partition a gene tree $T$ and thus, its leaf set into subsets of genes whose evolutionary history is described by speciation and duplication events alone. Indirect phylogenetic methods can be used to infer such partitions $\mathcal{P}$ from sequence similarity or evolutionary distances without any a priory knowledge about the underlying tree $T$. In this contribution, we assume that such a partition $\mathcal{P}$ of a set of genes $X$ is given and that, independently, an estimate $T$ of the original gene tree on $X$ has been derived. We then ask to what extent $T$ and the xenology information, i.e., $\mathcal{P}$ can be combined to determine the horizontal transfer edges in $T$. We show that for each pair of genes $x$ and $y$ with $x,y$ being in different parts of $\mathcal{P}$, it can be decided whether there always exists or never exists a horizontal gene transfer in $T$ along the path connecting $y$ and the most recent common ancestor of $x$ and $y$. This problem is equivalent to determining the presence or absence of the directed edge $(x,y)$ in so-called Fitch graphs; a more fine-grained version of graphs that represent the dependencies between the sets in $\mathcal{P}$. We then consider the generalization to insufficiently resolved gene trees and show that analogous results can be obtained. We show that the classification of $(x,y)$ can be computed in constant time after linear-time preprocessing. Using simulated gene family histories, we observe empirically that the vast majority of horizontal transfer edges in the gene tree $T$ can be recovered unambiguously.
翻译:水平基因转移事件隔开基因树$T$, 因此, 它的叶子被设置成基因子组 { 仅以物种和重复事件来描述基因进化历史。 我们然后询问, 从序列的相似性或进化距离中, 可以使用非直接的植物遗传方法, 从序列的相似性或进化距离中推断出美元 $mathcal{P} 美元, 而没有先入之见 $T 美元 。 在这个贡献中, 我们假设, 提供一套基因的 $\ mathcal{P} 美元, 并且独立地计算出 $X 的原基因树的进化历史。 我们问, 美元和 x 信息的进化历史信息, 即 美元 美元 和 美元 美元 的进化 。 我们可以考虑, 美元 美元 的原基因树的进化, 美元 在路径上, 美元 美元 的直径直径 直径 直值 或直径直径 直径 。