We consider the numerical taxonomy problem of fitting a positive distance function ${D:{S\choose 2}\rightarrow \mathbb R_{>0}}$ by a tree metric. We want a tree $T$ with positive edge weights and including $S$ among the vertices so that their distances in $T$ match those in $D$. A nice application is in evolutionary biology where the tree $T$ aims to approximate the branching process leading to the observed distances in $D$ [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees, and for the special case of ultrametrics with a root having the same distance to all vertices in $S$. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was $O((\log n)(\log \log n))$ by Ailon and Charikar [2005] who wrote "Determining whether an $O(1)$ approximation can be obtained is a fascinating question".
翻译:我们认为,在数字分类学上,需要用树来设置正距离函数${D:{S\choose 2 ⁇ rightrow \mathbb R ⁇ 0 ⁇ $。我们需要一个具有正边缘重量的树$T$,并在顶端中包括$S$,这样它们的距离就等于$D。在进化生物学中,一个很好的应用是树$T旨在接近分支过程,导致观察到的距离为$D[Cavalli-Sforza和Edwards 1967]。我们考虑的是总错误,即所有两点的距离差错之和。我们提出一个确定性多元时间算法,在恒定系数内将总错误最小化。我们可以对普通树木和根与所有脊椎以$S$的特例都这样做。问题在于APX-hard,因此一个恒定因素是我们在多诺米时间中可以期望的最好因素。 最好的前近点系数是$O(log n)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\