This paper focuses on the problem of hierarchical non-overlapping clustering of a dataset. In such a clustering, each data item is associated with exactly one leaf node and each internal node is associated with all the data items stored in the sub-tree beneath it, so that each level of the hierarchy corresponds to a partition of the dataset. We develop a novel Bayesian nonparametric method combining the nested Chinese Restaurant Process (nCRP) and the Hierarchical Dirichlet Process (HDP). Compared with other existing Bayesian approaches, our solution tackles data with complex latent mixture features which has not been previously explored in the literature. We discuss the details of the model and the inference procedure. Furthermore, experiments on three datasets show that our method achieves solid empirical results in comparison with existing algorithms.
翻译:本文侧重于数据集的分级不重叠分组问题。 在这种分组中,每个数据项都与一个叶节完全相关,每个内部节点都与下面小树中储存的所有数据项相关,这样,每个层次的分级都与数据集的分割相对应。我们开发了一种新型的巴伊西亚非参数方法,将嵌套的中国餐馆进程和等级分级进程结合起来。与其他已有的巴伊西亚方法相比,我们的解决办法处理的是复杂的潜在混合物特征数据,而文献中以前没有探讨过这些特征。我们讨论了模型的细节和推论程序。此外,对三个数据集的实验表明,我们的方法与现有的算法相比,取得了可靠的实证结果。