Phylogenetic trees are the fundamental mathematical representation of evolutionary processes in biology. They are also objects of interest in pure mathematics, such as algebraic geometry and combinatorics, due to their discrete geometry. Although they are important data structures, they face the significant challenge that sets of trees form a non-Euclidean phylogenetic tree space, which means that standard computational and statistical methods cannot be directly applied. In this work, we explore the statistical feasibility of a pure mathematical representation of the set of all phylogenetic trees based on tropical geometry for both descriptive and inferential statistics, and unsupervised and supervised machine learning. Our exploration is both theoretical and practical. We show that the tropical geometric phylogenetic tree space endowed with a generalized Hilbert projective metric exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics and allow for well-defined questions to be posed. We illustrate the statistical feasibility of the tropical geometric perspective for phylogenetic trees with an example of both a descriptive and inferential statistical task. Moreover, this approach exhibits increased computational efficiency and statistical performance over the current state-of-the-art, which we illustrate with a real data example on seasonal influenza. Our results demonstrate the viability of the tropical geometric setting for parametric statistical and probabilistic studies of sets of phylogenetic trees.
翻译:植物基因树是生物学进化过程的基本数学代表。它们也是纯数学,例如代数几何学和组合数学,由于它们离散的几何学,它们也是纯数学感兴趣的对象。虽然它们是重要的数据结构,但它们面临着巨大的挑战,即成组的树木构成非二二二分种的植物基因树空间,这意味着标准计算和统计方法不能直接应用。在这项工作中,我们探讨所有植物基因树的纯粹数学代表的统计可行性,这些植物基于热带几何学,用于描述性和感知性统计,以及不受监督和监督的机器学习。我们的探索既具有理论性和实用性,也具有理论性和实用性。我们展示了热带地理地理遗传学树木空间,具有通用的Hilbert预测性指标性树体空间,具有分析性、几何和地形学特性,对于概率和统计学研究的理论研究来说是可取的,并能够提出定义明确的问题。我们用一个描述性和精确的机理学观点来说明对植物植物植物植物植物的热带地理学观点的统计可行性。我们用一个描述性和精确的模型研究范例来展示我们的统计性、我们统计性统计性统计性统计性统计性研究的模型的模型,并展示了我们的统计性统计性统计性统计性统计性统计性结果的模型的模型,我们对统计性结果的统计性结果的精确性结果的模型的模型的计算结果的精确性研究,我们增加了了我们对统计性统计性统计性统计性能的模型的模型的精确性研究的精确性研究。