We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram. Specifically, we propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram. Current approaches to these tasks lead to loss of information since they require the user to generate a single partition of the instances by cutting the dendrogram at a specified level. Our proposed methods, instead, use the full structure of the dendrogram. The key insight behind the proposed methods is to view a dendrogram as a phylogeny. This analogy permits the assignment of a feature value to each internal node of a tree through an evolutionary model. Real and simulated datasets provide evidence that our proposed framework has desirable outcomes and gives more insights than state-of-art approaches. We provide an R package that implements our methods.
翻译:我们建议了完全使用斜度计提供的多分辨率结构的分层分类分析方法。 具体地说, 我们提议了在组合方法、 特征重要评分和一个图形工具之间作出选择的损失, 以可视化斜度计特征的分化。 目前对这些任务采取的办法导致信息丢失, 因为它们要求用户通过将斜度计剪切到特定水平来生成单一的区划。 我们提议的方法则使用斜度计的完整结构。 所提议方法背后的关键见解是将斜度计视为一种植物。 这一类比可以通过进化模型为树的每个内部节点分配一个特征值。 真实和模拟数据集提供了证据,证明我们提议的框架有可取的结果,比最新方法更能提供洞察力。 我们提供一套R包,用以执行我们的方法。