Selecting the number of clusters is one of the key processes when applying clustering algorithms. To fulfill this task, various cluster validity indices (CVIs) have been introduced. Most of the cluster validity indices are defined to detect the optimal number of clusters hidden in a dataset. However, users sometimes do not expect to get the optimal number of groups but a secondary one which is more reasonable for their applications. This has motivated us to introduce a Bayesian cluster validity index (BCVI) based on existing underlying indices. This index is defined based on either Dirichlet or Generalized Dirichlet priors which result in the same posterior distribution. Our BCVI is then tested based on the Wiroonsri index (WI), and the Wiroonsri-Preedasawakul index (WP) as underlying indices for hard and soft clustering, respectively. We compare their outcomes with the original underlying indices, as well as a few more existing CVIs including Davies and Bouldin (DB), Starczewski (STR), Xie and Beni (XB), and KWON2 indices. Our proposed BCVI clearly benefits the use of CVIs when experiences matter where users can specify their expected range of the final number of clusters. This aspect is emphasized by our experiment classified into three different cases. Finally, we present some applications to real-world datasets including MRI brain tumor images. Our tools will be added to a new version of the recently developed R package ``UniversalCVI''.
翻译:暂无翻译