Estimation of the number of components (or order) of a finite mixture model is a long standing and challenging problem in statistics. We propose the Group-Sort-Fuse (GSF) procedure -- a new penalized likelihood approach for simultaneous estimation of the order and mixing measure in multidimensional finite mixture models. Unlike methods which fit and compare mixtures with varying orders using criteria involving model complexity, our approach directly penalizes a continuous function of the model parameters. More specifically, given a conservative upper bound on the order, the GSF groups and sorts mixture component parameters to fuse those which are redundant. For a wide range of finite mixture models, we show that the GSF is consistent in estimating the true mixture order and achieves the $n^{-1/2}$ convergence rate for parameter estimation up to polylogarithmic factors. The GSF is implemented for several univariate and multivariate mixture models in the R package GroupSortFuse. Its finite sample performance is supported by a thorough simulation study, and its application is illustrated on two real data examples.
翻译:对有限混合物模型的成分(或顺序)数量进行估计是一个长期存在且具有挑战性的统计数据问题。我们建议采用小组-软化(GSF)程序 -- -- 一种在多维有限混合物模型中同时估计顺序和混合测量值的新的惩罚性可能性方法。与采用涉及模型复杂性的标准,适合和比较混合物和不同顺序的方法不同,我们的方法直接惩罚模型参数的连续功能。更具体地说,鉴于该序列的保守上层界限,GSF组和混合物成分参数的种类将多余的混合物聚合起来。对于多种有限的混合物模型,我们表明,GSF在估计真正的混合物顺序方面是一致的,并达到参数估计值的合合率,直至多元系数。GSF用于R包件SortFuse中的若干单体和多变量混合物模型,其有限的样品性性能得到全面模拟研究的支持,并在两个真实数据实例中说明其应用情况。