A well-known metric for quantifying the similarity between two clusterings is the adjusted mutual information. Compared to mutual information, a corrective term based on random permutations of the labels is introduced, preventing two clusterings being similar by chance. Unfortunately, this adjustment makes the metric computationally expensive. In this paper, we propose a novel adjustment based on {pairwise} label permutations instead of full label permutations. Specifically, we consider permutations where only two samples, selected uniformly at random, exchange their labels. We show that the corresponding adjusted metric, which can be expressed explicitly, behaves similarly to the standard adjusted mutual information for assessing the quality of a clustering, while having a much lower time complexity. Both metrics are compared in terms of quality and performance on experiments based on synthetic and real data.
翻译:用于量化两个组群之间相似之处的一个众所周知的衡量标准是经调整的相互信息。与相互信息相比,引入了一个基于标签随机变换的纠正术语,防止两个组群偶然出现类似的情况。不幸的是,这一调整使得衡量尺度的计算成本高得多。在本文中,我们建议根据{parwise}标签变换而不是完全标签变换进行新的调整。具体地说,我们考虑的是只有两个样本的变换,这些样本是随机选择的,可以交换它们的标签。我们表明,相应的调整指标可以明确表达,其行为与评估一个组群质量的经调整的标准相互信息相似,同时时间复杂性要低得多。两种指标都是在基于合成和真实数据的实验质量和性能方面进行比较的。