Multiple imputation (MI) is a popular method for dealing with missing values. However, the suitable way for applying clustering after MI remains unclear: how to pool partitions? How to assess the clustering instability when data are incomplete? By answering both questions, this paper proposed a complete view of clustering with missing data using MI. The problem of partitions pooling is here addressed using consensus clustering while, based on the bootstrap theory, we explain how to assess the instability related to observed and missing data. The new rules for pooling partitions and instability assessment are theoretically argued and extensively studied by simulation. Partitions pooling improves accuracy while measuring instability with missing data enlarges the data analysis possibilities: it allows assessment of the dependence of the clustering to the imputation model, as well as a convenient way for choosing the number of clusters when data are incomplete, as illustrated on a real data set.
翻译:多重估算(MI)是处理缺失值的流行方法。 但是,在MI 之后应用分组的合适方法仍然不清楚:如何合并分割区? 在数据不完整时如何评估分组不稳定性? 通过回答这两个问题,本文件提出了一个完整的分组观点,即使用MI 进行数据缺失的数据。 分区集合问题在这里通过协商一致的分组来解决,而根据靴子陷阱理论,我们解释如何评估与被观测和缺失的数据有关的不稳定性。 集合分割区和不稳定性评估的新规则在理论上加以论证,并通过模拟进行广泛研究。 分区汇集在测量缺少数据的不稳定性的同时提高准确性,扩大了数据分析的可能性:它能够评估集群与估算模型的依赖性,并且可以方便地在数据不完整时选择组数,如真实数据集所示。