In this work, we present a novel method for divisive hierarchical variable clustering. A cluster is a group of elements that exhibit higher similarity among themselves than to elements outside this cluster. The correlation coefficient serves as a natural measure to assess the similarity of variables. This means that in a correlation matrix, a cluster is represented by a block of variables with greater internal than external correlation. Our approach provides a nonparametric solution to identify such block structures in the correlation matrix using singular vectors of the underlying data matrix. When divisively clustering $p$ variables, there are $2^{p-1}$ possible splits. Using the singular vectors for cluster identification, we can effectively reduce these number to at most $p(p-1)$, thereby making it computationally efficient. We elaborate on the methodology and outline the incorporation of dissimilarity measures and linkage functions to assess distances between clusters. Additionally, we demonstrate that these distances are ultrametric, ensuring that the resulting hierarchical cluster structure can be uniquely represented by a dendrogram, with the heights of the dendrogram being interpretable. To validate the efficiency of our method, we perform simulation studies and analyze real world data on personality traits and cognitive abilities. Supplementary materials for this article can be accessed online.
翻译:暂无翻译