Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could offer faster alternatives. We study here the information that average pairwise distances contain on the underlying phylogenetic network, by characterizing local and global features that can or cannot be identified. For general networks, we clarify that the root and edge lengths adjacent to reticulations are not identifiable, and then focus on the class of zipped-up semidirected networks. We provide a criterion to swap subgraphs locally, such as 3-cycles, resulting in undistinguishable networks. We propose the "distance split tree", which can be constructed from pairwise distances, and prove that it is a refinement of the network's blob tree, capturing the tree-like features of the network. For level-1 networks, this distance split tree is equal to the blob tree refined to separate polytomies from blobs, and we prove that the mixed representation of the network is identifiable. The information loss is localized around 4-cycles, for which the placement of the reticulation is unidentifiable. The mixed representation combines split edges for 4-cycles, with regular tree and hybrid edges from the semidirected network, with edge parameters that encode all information identifiable from average pairwise distances.
翻译:植物遗传网络将植物遗传树扩展至非纵向继承的模型, 使一系系系继承来自多个父母的材料。 从基因组全域数据中估算植物遗传网络的计算复杂性, 以及基于可能性的方法限制了可处理网络的大小。 基于对称距离的方法可以提供更快的替代方法。 我们在这里研究基础植物遗传网络中平均对称距离所含信息, 描述能够或无法辨别的地方和全球特征。 对于一般网络, 我们澄清, 与回溯相相邻的根和边缘长度无法辨别, 然后侧重于固定半方向网络的分类。 我们提供了一个标准, 用来对本地的子结构进行交换, 如三周期, 从而导致不可分解的网络规模。 我们建议使用“ 远分立树 ”, 它可以用双向距离构建, 并证明它是网络的精细化和类似树的特征。 对于双向的双向边缘网络来说, 这条距离的离谱树和双向树的边缘参数是相等的, 并且我们和双向的双向树级半向树类分系 。 我们用固定的分流的分流的网络, 和混合循环的分层代表是可识别 4 。