A relational dataset is often analyzed by optimally assigning a label to each element through clustering or ordering. While similar characterizations of a dataset would be achieved by both clustering and ordering methods, the former has been studied much more actively than the latter, particularly for the data represented as graphs. This study fills this gap by investigating methodological relationships between several clustering and ordering methods, focusing on spectral techniques. Furthermore, we evaluate the resulting performance of the clustering and ordering methods. To this end, we propose a measure called the label continuity error, which generically quantifies the degree of consistency between a sequence and partition for a set of elements. Based on synthetic and real-world datasets, we evaluate the extents to which an ordering method identifies a module structure and a clustering method identifies a banded structure.
翻译:摘要:通过聚类或排序最佳地分配标签给每个元素来分析关系数据集。虽然聚类和排序方法可实现相似的数据集特征描述,但前者已经比后者更被广泛研究,特别是对于图形表示的数据。本研究填补了这个空白,通过研究几种聚类和排序方法之间的方法论关系,重点关注光谱技术。此外,我们评估聚类和排序方法的性能表现。为此,我们提出了一个称为标签连续性误差的度量标准,该标准通用地量化一组元素的序列和分区之间的一致性程度。基于合成和真实数据集,我们评估排序方法识别模块结构和聚类方法识别带状结构的程度。