Comprehensive benchmarking of clustering algorithms is rendered difficult by two key factors: (i)~the elusiveness of a unique mathematical definition of this unsupervised learning approach and (ii)~dependencies between the generating models or clustering criteria adopted by some clustering algorithms and indices for internal cluster validation. Consequently, there is no consensus regarding the best practice for rigorous benchmarking, and whether this is possible at all outside the context of a given application. Here, we argue that synthetic datasets must continue to play an important role in the evaluation of clustering algorithms, but that this necessitates constructing benchmarks that appropriately cover the diverse set of properties that impact clustering algorithm performance. Through our framework, HAWKS, we demonstrate the important role evolutionary algorithms play to support flexible generation of such benchmarks, allowing simple modification and extension. We illustrate two possible uses of our framework: (i)~the evolution of benchmark data consistent with a set of hand-derived properties and (ii)~the generation of datasets that tease out performance differences between a given pair of algorithms. Our work has implications for the design of clustering benchmarks that sufficiently challenge a broad range of algorithms, and for furthering insight into the strengths and weaknesses of specific approaches.
翻译:综合群集算法的全面基准化因以下两个关键因素而变得困难:(一) 这种不受监督的学习方法的独特数学定义难以捉摸,以及(二) 某些群集算法和内部群集验证指数所采用的生成模型或群集标准之间的相互依存性,因此,对于严格基准化的最佳做法,以及是否在特定应用范围以外都可能这样做,没有达成共识。在这里,我们认为,合成数据集必须继续在组群算法的评估中发挥重要作用,但这就需要建立适当涵盖影响群集算法绩效的各种特性的基准。通过我们的框架HAWKS,我们展示了演进算法在支持灵活生成这类基准方面的重要作用,允许简单修改和扩展。我们说明了我们框架的两种可能用途:(一) 基准数据的演变符合一套手导特性,以及(二) 生成数据集,以勾画出某组算法之间的业绩差异。我们的工作对设计群集基准具有影响,这些基准足以挑战广泛的算法的弱点和具体方法。