This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins -- but leaves the joint unspecified -- it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.
翻译:本文利用基于模型的集群的松散性,开发了一种集群方法,利用基于模型的集群的松散性,同时试图减轻其某些陷阱。首先,我们注意到,标准的基于模型的集群可能导致每差值的集群数量相同,这似乎是对各种数据集的一种相当人为的假设。我们通过规定一种允许每个差值具有不同组数的限定混合模型来解决这一问题,然后利用一种我们称之为 Reign-Anquer 的战略游戏驱动算法将多变量数据组合起来。第二,由于拟议的集群方法只为边际指定一个模型 -- -- 但却没有指明共同值 -- -- 其优点是可以部分平行的;因此,拟议的方法在计算上既具有吸引力,也比“充分”(联合)基于模型的集群方法的中高层面更具有可拉动性。 人工数据的数字实验显示各种情景中拟议方法的总体良好表现,而且实际数据集被用来展示其实际应用情况。