Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could offer faster alternatives. We study here the information that average pairwise distances contain on the underlying phylogenetic network, by characterizing local and global features that can or cannot be identified. For general networks, we clarify that the root and edge lengths adjacent to reticulations are not identifiable, and then focus on the class of zipped-up semidirected networks. We provide a criterion to swap subgraphs locally, such as 3-cycles, resulting in indistinguishable networks. We propose the "distance split tree", which can be constructed from pairwise distances, and prove that it is a refinement of the network's tree of blobs, capturing the tree-like features of the network. For level-1 networks, this distance split tree is equal to the tree of blobs refined to separate polytomies from blobs, and we prove that the mixed representation of the network is identifiable. The information loss is localized around 4-cycles, for which the placement of the reticulation is unidentifiable. The mixed representation combines split edges for 4-cycles, regular tree and hybrid edges from the semidirected network, and edge parameters that encode all information identifiable from average pairwise distances.
翻译:植物遗传网络将植物遗传树扩展至非纵向继承的模型, 通过这种模型, 树系可以继承来自多个父母的材料。 从基因组全域数据中估算植物遗传网络的计算复杂性, 以及基于可能性的方法限制了可处理网络的大小。 基于对称距离的方法可以提供更快的替代方法。 我们在这里研究基础植物遗传网络中平均对称距离所包含的信息, 描述能够或无法辨别的地方和全球特征。 对于一般网络, 我们澄清, 与回溯性网络相邻的根长和边缘长度无法识别, 然后将焦点放在固定半方向网络的离岸参数类别上。 我们提供了一个标准, 用来对本地的子网络进行交换, 如3个周期, 从而导致不可分立的网络。 我们提议“ 距离分立树 ”, 可以从双向距离构建, 证明这是网络的精细化树, 捕捉到网络的树类比特征。 对于一级网络, 这个距离分界的边缘树和半边端半端半端半端网络的树等, 我们提供了一个标准, 固定的网络的分层代表 。