Rank-based linkage is a new tool for summarizing a collection $S$ of objects according to their relationships. These objects are not mapped to vectors, and ``similarity'' between objects need be neither numerical nor symmetrical. All an object needs to do is rank nearby objects by similarity to itself, using a Comparator which is transitive, but need not be consistent with any metric on the whole set. Call this a ranking system on $S$. Rank-based linkage is applied to the $K$-nearest neighbor digraph derived from a ranking system. Computations occur on a 2-dimensional abstract oriented simplicial complex whose faces are among the points, edges, and triangles of the line graph of the undirected $K$-nearest neighbor graph on $S$. In $|S| K^2$ steps it builds an edge-weighted linkage graph $(S, \mathcal{L}, \sigma)$ where $\sigma(\{x, y\})$ is called the in-sway between objects $x$ and $y$. Take $\mathcal{L}_t$ to be the links whose in-sway is at least $t$, and partition $S$ into components of the graph $(S, \mathcal{L}_t)$, for varying $t$. Rank-based linkage is a functor from a category of out-ordered digraphs to a category of partitioned sets, with the practical consequence that augmenting the set of objects in a rank-respectful way gives a fresh clustering which does not ``rip apart`` the previous one. The same holds for single linkage clustering in the metric space context, but not for typical optimization-based methods. Open combinatorial problems are presented in the last section.
翻译:基于排名的链接是一种根据对象之间的关系对集合进行总结的新工具。这些对象没有被映射为向量,它们之间的“相似性”不需要是数值的或对称的。对象只需使用 Comparator 对自身附近的对象进行相似性排序,其中 Comparator 是可传递的,但不一定与整个集合上的任何度量一致。将其称为 $S$ 上的排名系统。基于排名的链接应用于从排名系统派生的 $K$-最近邻有向图。计算发生在一个二维的抽象 orientated simplicial 复合体上,其面在 $S$ 的线图中的点、边和三角形之间。在 $|S|K^2$ 步骤中,它构建了一个加权链接图 $(S,\mathcal{L},\sigma)$,其中 $\sigma(\{x, y\})$ 被称为对象 $x$ 和 $y$ 之间的 in-sway。将 $\mathcal{L}_t$ 视为 in-sway 至少为 $t$ 的链接,并将 $S$ 划分为 $(S,\mathcal{L}_t)$ 的图分量,针对变化的 $t$。基于排名的链接是从有向图的一类到分割集合的一个函数子,它的实际结果是以排名原则为基础进行一次集合增量,可以得到一个新的聚类,不会“撕裂”前一个聚类。同样适用于度量空间上的单连接聚类,但对于典型的基于优化的方法则不是这样。最后一节介绍了开放的组合问题。