This paper addresses the problem of full model estimation for non-parametric finite mixture models. It presents an approach for selecting the number of components and the subset of discriminative variables (i.e., the subset of variables having different distributions among the mixture components). The proposed approach considers a discretization of each variable into B bins and a penalization of the resulting log-likelihood. Considering that the number of bins tends to infinity as the sample size tends to infinity, we prove that our estimator of the model (number of components and subset of relevant variables for clustering) is consistent under a suitable choice of the penalty term. Interest of our proposal is illustrated on simulated and benchmark data.
翻译:本文件探讨非参数性有限混合物模型的全面模型估计问题,提出选择成分数量和有区别变量子集(即混合物成分中分布不同的变量子集)的方法,建议的方法考虑将每个变量分解成B箱,并对由此产生的日志相似性进行处罚。考虑到由于样本大小往往不尽相同,垃圾箱的数量往往不尽相同,我们证明模型的估测者(成分数量和集群相关变量子集)在适当选择惩罚术语时是一致的。我们的建议的利息以模拟和基准数据为说明。