Recently, Hierarchical Clustering (HC) has been considered through the lens of optimization. In particular, two maximization objectives have been defined. Moseley and Wang defined the \emph{Revenue} objective to handle similarity information given by a weighted graph on the data points (w.l.o.g., $[0,1]$ weights), while Cohen-Addad et al. defined the \emph{Dissimilarity} objective to handle dissimilarity information. In this paper, we prove structural lemmas for both objectives allowing us to convert any HC tree to a tree with constant number of internal nodes while incurring an arbitrarily small loss in each objective. Although the best-known approximations are 0.585 and 0.667 respectively, using our lemmas we obtain approximations arbitrarily close to 1, if not all weights are small (i.e., there exist constants $\epsilon, \delta$ such that the fraction of weights smaller than $\delta$, is at most $1 - \epsilon$); such instances encompass many metric-based similarity instances, thereby improving upon prior work. Finally, we introduce Hierarchical Correlation Clustering (HCC) to handle instances that contain similarity and dissimilarity information simultaneously. For HCC, we provide an approximation of 0.4767 and for complementary similarity/dissimilarity weights (analogous to $+/-$ correlation clustering), we again present nearly-optimal approximations.
翻译:最近,从优化的角度对等级分组(HC)进行了考虑,特别是确定了两个最大化目标。Moseley和Wang定义了处理数据点加权图(W.l.o.g.g. $[0,1,1美元重量)提供的相似信息的目标,而Cohen-Addad等人则定义了处理差异信息的目标。在本文中,我们证明两个目标的结构性脂质使我们能够将任何HC树转换成一棵具有不变内部节点数的树,而每个目标则造成任意的小损失。尽管最著名的近似值分别为0.585和0.667,但使用我们的亮点得到任意接近1的近似信息(如果不是所有重量都很小的话),而Cohen-adadadadad 等人则定义了处理差异信息的目标。在本文中,两个目标的结构性脂质都证明,让我们得以将任何HC树转换成一棵具有不变的内部节点数的树,而每个目标则造成任意的微小的损失。尽管最著名的近似值的近似值是0.585和近似的近似点,因此,我们在前的工作中也同时提出了类似。