Multiple imputation (MI) is a popular method for dealing with missing values. However, the suitable way for applying clustering after MI remains unclear: how to pool partitions? How to assess the clustering instability when data are incomplete? By answering both questions, this paper proposed a complete view of clustering with missing data using MI. The problem of partitions pooling is here addressed using consensus clustering while, based on the bootstrap theory, we explain how to assess the instability related to observed and missing data. The new rules for pooling partitions and instability assessment are theoretically argued and extensively studied by simulation. Partitions pooling improves accuracy, while measuring instability with missing data enlarges the data analysis possibilities: it allows assessment of the dependence of the clustering to the imputation model, as well as a convenient way for choosing the number of clusters when data are incomplete, as illustrated on a real data set.
翻译:多重估算(MI)是处理缺失值的流行方法。 但是,在MI 之后应用分组的合适方法仍然不清楚:如何集合分割区? 在数据不完整时如何评估分组不稳定性? 通过回答这两个问题,本文件提出了一个完整的组合观点,即使用MI 进行数据缺失的数据。 分区集合问题在这里通过协商一致分组解决,而根据靴子陷阱理论,我们解释如何评估与所观测和缺失数据有关的不稳定性。 集合分割区和不稳定评估的新规则在理论上有争论,并通过模拟进行广泛研究。 分区共享提高了准确性,同时测量缺少数据的不稳定性扩大了数据分析的可能性:它能够评估集群与估算模型的依赖性,以及如真实数据集所示,在数据不完整时选择组数的方便方式。