基于奇异向量识别的变量分裂层次聚类法 (Divisive Hierarchical Clustering of Variables Identified by Singular Vectors)

In this work, we introduce a novel methodology for divisive hierarchical clustering. Our divisive (``top-down'') approach is motivated by the fact that agglomerative hierarchical clustering (``bottom-up''), which is commonly used for hierarchical clustering, is not the best choice for all settings. The proposed methodology approximates the similarity matrix by a block diagonal matrix to identify clusters. While divisively clustering $p$ elements involves evaluating $2^{p-1}-1$ possible splits, which makes the task computationally costly, this approximation effectively reduces this number to at most $p(p-1)$ candidates, ensuring computational feasibility. We elaborate on the methodology and describe the incorporation of linkage functions to assess distances between clusters. We further show that these distances are ultrametric, ensuring that the resulting hierarchical cluster structure can be uniquely represented by a dendrogram, with interpretable heights. Additionally, the proposed methodology exhibits the flexibility to also optimize objectives of other clustering methods, and it can outperform these. The methodology is also applicable for constructing balanced clusters. To validate the efficiency of our approach, we conduct simulation studies and analyze real-world data. Supplementary materials for this article can be accessed online.

翻译：本文提出了一种新颖的分裂式层次聚类方法。传统层次聚类通常采用凝聚式（"自底向上"）策略，但该方法并非适用于所有场景，因此我们设计了分裂式（"自顶向下"）的替代方案。该方法通过用块对角矩阵近似相似度矩阵来识别聚类簇。虽然对p个元素进行分裂式聚类需要评估2^{p-1}-1种可能划分，计算代价高昂，但该近似方法将候选划分数量有效降至最多p(p-1)种，保证了计算可行性。我们详细阐述了该方法，并描述了如何结合连接函数来评估聚类间距离。进一步证明这些距离满足超度量性质，确保生成的层次聚类结构可通过树状图唯一表示，且其高度具有可解释性。此外，该方法还展现出优化其他聚类目标函数的灵活性，并能取得更优性能。该方法同样适用于构建平衡聚类。为验证方法的有效性，我们进行了模拟研究并分析了真实世界数据。本文的补充材料可通过在线渠道获取。