We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with Wasserstein distance metrics. We propose several variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. Consistency properties are established for the estimates of both local and global clusters. Finally, experimental results with both synthetic and real data are presented to demonstrate the flexibility and scalability of the proposed approach.
翻译:我们建议对多层次集群问题采取新颖的办法,目的是在每一组中同时提供分区数据,并发现各组群之间在可能具有较大等级结构的一组数据中的分组模式,我们的方法涉及对若干不同概率计量空间进行联合优化配方,这些空间带有瓦森斯坦距离测量仪,我们提出了这一问题的若干变式,其中承认快速优化算法,利用与寻找瓦森斯坦干点问题的联系,确立了对当地和全球组群进行估算的一致性特性,最后,用合成数据和真实数据提出实验结果,以显示拟议方法的灵活性和可扩展性。