We are concerned in clustering continuous data sets subject to non-ignorable missingness. We perform clustering with a specific semi-parametric mixture, under the assumption of conditional independence given the component. The mixture model isused for clustering and not for estimating the density of the full variables (observed and unobserved), thus we do not need other assumptions on the component distribution neither to specify the missingness mechanism. Estimation is performed by maximizing an extension of smoothed likelihood allowing missingness. This optimization is achieved by a Majorization-Minorization algorithm. We illustrate the relevance of our approach by numerical experiments. Under mild assumptions, we show the identifiability of the model defining the distribution of the observed data and the monotony of the algorithm. We also propose an extension of this new method to the case of mixed-type data that we illustrate on a real data set.
翻译:我们担心的是在不显眼缺失的情况下对连续数据集进行分组。 我们使用特定的半参数混合组合进行分组, 假设部分具有有条件的独立性。 混合模型用于分组, 而不是估计全部变量的密度( 观察和未观察), 因此我们不需要关于部件分布的其他假设, 也不必具体说明缺失机制。 估计是通过尽可能扩大顺畅的可能性, 允许缺失实现优化。 这种优化是通过一个多数化- 最小化算法实现的。 我们用数字实验来说明我们的方法的适切性。 在轻微的假设下, 我们显示了界定观察到的数据分布和算法独一的模型的可识别性。 我们还提议将这一新方法扩大到我们用真实数据集来说明的混合型数据。