Phylogenetic trees are the fundamental mathematical representation of evolutionary processes in biology. They are also objects of interest in pure mathematics, such as algebraic geometry and combinatorics, due to their discrete geometry. Although they are important data structures, they face the significant challenge that sets of trees form a non-Euclidean phylogenetic tree space, which means that standard computational and statistical methods cannot be directly applied. In this work, we explore the statistical feasibility of a pure mathematical representation of the set of all phylogenetic trees based on tropical geometry. We show that the tropical geometric phylogenetic tree space endowed with a generalized Hilbert projective metric exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics. Moreover, this approach exhibits increased computational efficiency and statistical performance over the current state-of-the-art, which we illustrate with a real data example on seasonal influenza. Our results demonstrate the viability of the tropical geometric setting for parametric statistical and probabilistic studies of sets of phylogenetic trees.
翻译:植物基因树是生物学进化过程的基本数学代表,也是纯数学,例如代数几何和组合数,因其离散几何结构而引人关注的物体。尽管它们是重要的数据结构,但它们面临巨大的挑战,即几组树木构成非欧洲植物遗传的树体空间,这意味着标准计算和统计方法不能直接应用。在这项工作中,我们探索基于热带几何的一组植物基因树的纯数学代表的统计可行性。我们显示,具有普遍Hilbert投影性参数的热带几何植物植物空间具有对概率和统计理论研究的可取性、几何和地貌特征。此外,这一方法显示,与目前的状况相比,计算效率和统计表现得更高,我们用季节性流感的实际数据实例来说明这一点。我们的成果表明,具有典型植物基因树群的参数统计和对等分析研究的热带几何测地环境是可行的。