Phylogenetic trees are key data objects in biology, and the method of phylogenetic reconstruction has been highly developed. The space of phylogenetic trees is a nonpositively curved metric space. Recently, statistical methods to analyze the set of trees on this space are being developed utilizing this property. Meanwhile, in Euclidean space, the log-concave maximum likelihood method has emerged as a new nonparametric method for probability density estimation. In this paper, we derive a sufficient condition for the existence and uniqueness of the log-concave maximum likelihood estimator on tree space. We also propose an estimation algorithm for one and two dimensions. Since various factors affect the inferred trees, it is difficult to specify the distribution of sample trees. The class of log-concave densities is nonparametric, and yet the estimation can be conducted by the maximum likelihood method without selecting hyperparameters. We compare the estimation performance with a previously developed kernel density estimator numerically. In our examples where the true density is log-concave, we demonstrate that our estimator has a smaller integrated squared error when the sample size is large. We also conduct numerical experiments of clustering using the Expectation-Maximization (EM) algorithm and compare the results with k-means++ clustering using Fr\'echet mean.
翻译:植物基因树是生物学中的关键数据对象, 植物基因重建的方法也得到了高度开发。 植物基因树的空间是一个非积极的曲线度量空间。 最近, 正在利用这个属性开发分析这个空间的树木的统计方法。 同时, 在 Euclidean 空间, 日志- 凝聚的最大可能性方法已经形成为概率密度估计的一种新的非参数方法。 在本文中, 我们为树空间的日志- 最大可能性估计器的存在和独特性得出一个充分的条件。 我们还为一个和两个维提出了估算算法。 由于各种因素影响推断的树木, 很难指定样本树的分布。 日志- 剖析率密度是非参数性的, 然而, 在不选择超参数的情况下, 日志- 最大可能性方法可以进行估算。 我们用一个先前开发的内核密度估计器估算器的数值来比较估计性能。 在我们的例子中, 真正的密度是日冕的, 我们证明我们的估测算器使用一个较小的综合模型, 当我们使用数字模型的模型进行较小型的模型分析时, 也使用较小型的模型的模型的模型 。