We present an objective function for similarity based hierarchical clustering of partially ordered data that preserves the partial order in the sense that if $x \leq y$, and if $[x]$ and $[y]$ are the respective clusters of $x$ and $y$, then there is an order relation $\leq'$ on the clusters for which $[x] \leq' |y]$. The model distinguishes itself from existing methods and models for clustering of ordered data in that the order relation and the similarity are combined to obtain an optimal hierarchical clustering seeking to satisfy both, and that the order relation is equipped with a pairwise level of comparability in the range $[0,1]$. In particular, if the similarity and the order relation are not aligned, then order preservation may have to yield in favor of clustering. Finding an optimal solution is NP-hard, so we provide a polynomial time approximation algorithm, with a relative performance guarantee of $O(\log^{3/2}n)$, based on successive applications of directed sparsest cut. The model is an extension of the Dasgupta cost function for divisive hierarchical clustering.
翻译:我们为基于按部分订购的数据的等级分组提供基于等级分组的相似性客观功能,这种分类使部分订购的数据保持部分顺序,也就是说,如果美元==leq y $,如果$xx$和$[y]$是相应的x美元和$y$,那么对于$xx]\leq' ⁇ y]$的分组,则存在一个排序关系。模型将自己与现有按订单数据分组的方法和模式区分开来,即将顺序关系和相似性结合起来,以获得一种最佳的等级分组,力求满足两者,而且订单关系在[$0,1,1]的范围内具有对应的可比性水平。特别是,如果相似性和顺序关系不一致,则命令保存可能不得不产生有利于组合的收益。找到一个最优的解决方案是硬的,因此我们提供一种多数值时间近似值算法,其相对性能保证值为$O(log ⁇ 3/2}美元,其基础是连续应用指定的稀薄切割。模型是Dasgupta的成本功能的延伸。