We study the natural problem of Triplet Reconstruction (also Rooted Triplets Consistency or Triplet Clustering), originally motivated in computational biology and relational databases (Aho, Sagiv, Szymanski, and Ullman, 1981): given $n$ points, we want to embed them onto the $n$ leaves of a rooted binary tree (a hierarchical clustering or ultrametric embedding) such that a given set of $m$ triplet constraints is satisfied. Triplet $ij|k$ indicates that ``$i, j$ are more closely related to each other than to $k$'' and a tree satisfies $ij|k$ if $d(i,j)$ is the smallest among the 3 distances. Aho et al. (1981) gave an elegant efficient algorithm to find a tree respecting all constraints (if it exists) and it is easy to see that a random binary tree is a 1/3-approximation. Unfortunately, despite more than four decades of research, no better approximation is known. Our main theorem--which captures Triplet Reconstruction as a special case--is a general hardness of approximation result about Constraint Satisfaction Problems (CSPs) over infinite domains (the variables are mapped to any of the $n$ leaves of a tree). Specifically, we prove, under Unique Games (Khot, 2002), that Triplet Reconstruction and more generally, every CSP over hierarchies is approximation resistant (there is no polynomial-time algorithm that does asymptotically better than a biased random assignment). This settles the approximability for many interesting Subtree or Supertree Aggregation Problems. More broadly, our result significantly extends the list of approximation resistant predicates and is a generalization of Guruswami, Hastad, Manokaran, Raghavendra, and Charikar (2011), who showed that ordering CSPs are approximation resistant. The main challenge in our analyses stems from the fact that trees have topology which is what determines whether a given triplet constraint on the leaves is satisfied or not.
翻译:本文研究了三元组重建问题(也称为根三元组一致性或三元组聚类问题),该问题起源于计算生物学和关系数据库。给定$n$个点,我们希望将它们嵌入到一个具有$n$个叶节点的有根二叉树(层次聚类或超度嵌入)中,使得给定的$m$个三元组约束被满足。三元组$ij|k$表示“$i,j$比$k$更密切地相关”,如果$d(i,j)$是其中的最小距离,则树满足$ij|k$约束。Aho等人(1981)提出了一种优雅的有效算法,可以找到满足所有约束的树(如果存在),并且可以很容易地看出随机二叉树是1/3-近似解法。不幸的是,尽管经过了四十多年的研究,没有更好的近似解法被发现。我们的主要定理——涵盖三元组重建问题的一种特殊情况——是关于无限域(变量映射到树的任何一个叶节点)的约束满足问题(CSP)的一般近似难度结果。具体而言,我们依据唯一游戏(Khot,2002)证明,三元组重建和更普遍地说,所有基于多级组织结构的约束缩减问题都很难近似(不存在多项式时间的算法比偏差随机赋值更优)。这解决了许多有趣子树或超树聚合问题的可逼近性。更广泛地说,我们的结果显着扩展了不可近似谓词的列表,并且是Guruswami等人(2011)的推广,他们表明,有序CSP是难以近似的。我们分析的主要挑战在于树具有拓扑结构,这是决定树叶上的给定三元组约束是否满足的因素。