ABCDE is a sophisticated technique for evaluating differences between very large clusterings. Its main metric that characterizes the magnitude of the difference between two clusterings is the JaccardDistance, which is a true distance metric in the space of all clusterings of a fixed set of (weighted) items. The JaccardIndex is the complementary metric that characterizes the similarity of two clusterings. Its relationship with the JaccardDistance is simple: JaccardDistance + JaccardIndex = 1. This paper decomposes the JaccardDistance and the JaccardIndex further. In each case, the decomposition yields Impact and Quality metrics. The Impact metrics measure aspects of the magnitude of the clustering diff, while Quality metrics use human judgements to measure how much the clustering diff improves the quality of the clustering. The decompositions of this paper offer more and deeper insight into a clustering change. They also unlock new techniques for debugging and exploring the nature of the clustering diff. The new metrics are mathematically well-behaved and they are interrelated via simple equations. While the work can be seen as an alternative formal framework for ABCDE, we prefer to view it as complementary. It certainly offers a different perspective on the magnitude and the quality of a clustering change, and users can use whatever they want from each approach to gain more insight into a change.
翻译:暂无翻译