In the framework of model-based clustering, a model, called multi-partitions clustering, allowing several latent class variables has been proposed. This model assumes that the distribution of the observed data can be factorized into several independent blocks of variables, each block following its own mixture model. In this paper, we assume that each block follows a non parametric latent class model, {\it i.e.} independence of the variables in each component of the mixture with no parametric assumption on their class conditional distribution. The purpose is to deduce, from the observation of a sample, the number of blocks, the partition of the variables into the blocks and the number of components in each block, which characterise the proposed model. By following recent literature on model and variable selection in non-parametric mixture models, we propose to discretize the data into bins. This permits to apply the classical multi-partition clustering procedure for parametric multinomials, which are based on a penalized likelihood method (\emph{e.g.} BIC). The consistency of the procedure is obtained and an efficient optimization is proposed. The performances of the model are investigated on simulated data.
翻译:在基于模型的集群框架内,提出了一种模型,称为多部分群集,允许若干潜在类别变量。该模型假定观察到的数据的分布可以按各自的混合模型,每个块都按照自己的混合模型,以数个独立的变量区块为分数。在本文中,我们假设每个区块都遵循非参数性潜在类别模型,即混合物每个组成部分的变量的独立性,而其类别条件分布没有参数性假设。目的是从抽样观察中推断区块的数目、区块内变量的分区和每个区块内构件的数目,这是拟议模型的特征。我们根据最近关于非参数性混合模型中的模型和变量选择的文献,建议将数据分解成垃圾箱。我们建议允许将典型的多部分组合程序应用于对参数性多数值的多数值分布,该程序以惩罚性的可能性方法为基础(\emph{例如}BIC)。程序的一致性和拟议有效的优化。模型的性能通过模拟数据加以研究。