Missing data theory deals with the statistical methods in the occurrence of missing data. Missing data occurs when some values are not stored or observed for variables of interest. However, most of the statistical theory assumes that data is fully observed. An alternative to deal with incomplete databases is to fill in the spaces corresponding to the missing information based on some criteria, this technique is called imputation. We introduce a new imputation methodology for databases with univariate missing patterns based on additional information from fully-observed auxiliary variables. We assume that the non-observed variable is continuous, and that auxiliary variables assist to improve the imputation capacity of the model. In a fully Bayesian framework, our method uses a flexible mixture of multivariate normal distributions to model the response and the auxiliary variables jointly. Under this framework, we use the properties of Gaussian Cluster-Weighted modeling to construct a predictive model to impute the missing values using the information from the covariates. Simulations studies and a real data illustration are presented to show the method imputation capacity under a variety of scenarios and in comparison to other literature methods.
翻译:缺失的数据理论涉及发生缺失数据时的统计方法。 缺失的数据发生在某些数值没有储存或观测到有关变量时。 但是,大多数统计理论都假定数据已经完全观察到。 处理不完整数据库的替代办法是根据某些标准填补与缺失信息相对的空间, 这种方法称为估算法。 我们根据完全可见的辅助变量提供的额外信息,对有未读缺失模式的数据库采用新的估算方法。 我们假设,未观测变量是连续的, 辅助变量有助于提高模型的估算能力。 在完全的巴伊西亚框架内, 我们的方法使用多种变式正常分布的灵活组合来模拟反应和辅助变量。 在这个框架内, 我们使用高斯群集- Weighted 模型来构建一个预测模型, 以利用从全部观测到的辅助变量获得的信息来估算缺失值。 我们提出模拟研究和真实的数据示例, 以显示各种情景下和与其他文献方法相比较的方法的估算能力。