The Average Silhouette Width (ASW; Rousseeuw (1987)) is a popular cluster validation index to estimate the number of clusters. Here we address the question whether it also is suitable as a general objective function to be optimized for finding a clustering. We will propose two algorithms (the standard version OSil and a fast version FOSil) and compare them with existing clustering methods in an extensive simulation study covering the cases of a known and unknown number of clusters. Real data sets are also analysed, partly exploring the use of the new methods with non-Euclidean distances. We will also show that the ASW satisfies some axioms that have been proposed for cluster quality functions (Ackerman and Ben-David (2009)). The new methods prove useful and sensible in many cases, but some weaknesses are also highlighted. These also concern the use of the ASW for estimating the number of clusters together with other methods, which is of general interest due to the popularity of the ASW for this task.
翻译:平均Silhouette Width (ASW; Rousseeuw (1987)) 是用来估计组群数目的流行集束验证指数。 这里我们讨论的问题是,它是否也适合作为寻找组群的优化一般客观功能。 我们将提出两种算法(标准版本OSil 和快速版本FOSil),并在涵盖已知和未知组群数目的广泛模拟研究中将其与现有组群方法进行比较。 真实的数据集也得到了分析, 部分探索了使用非欧西里德距离的新方法。 我们还将表明, ASW 满足了为组群质量功能(阿克曼和本达维德(2009))所提议的某些正统。 新方法在许多情况下证明是有用和明智的,但也强调了一些弱点。 这些问题还涉及如何使用ASW与其他方法一起估计组群集的数量,这与其它方法是普遍感兴趣的,因为亚西德罗夫对这项任务很受欢迎。