Missing data is a common problem in clinical data collection, which causes difficulty in the statistical analysis of such data. In this article, we consider the problem under a framework of a semiparametric partially linear model when observations are subject to missingness with complex patterns. If the correct model structure of the additive partially linear model is available, we propose to use a new imputation method called Partial Replacement IMputation Estimation (PRIME), which can overcome problems caused by incomplete data in the partially linear model. Also, we use PRIME in conjunction with model averaging (PRIME-MA) to tackle the problem of unknown model structure in the partially linear model. In simulation studies, we use various error distributions, sample sizes, missing data rates, covariate correlations, and noise levels, and PRIME outperforms other methods in almost all cases. With an unknown correct model structure, PRIME-MA has satisfactory performance in terms of prediction, while slightly worse than PRIME. Moreover, we conduct a study of influential factors in Pima Indians Diabetes data, which shows that our method performs better than the other models.
翻译:缺少的数据是临床数据收集的一个常见问题,在对这些数据进行统计分析方面造成了困难。在本条中,当观测出现复杂模式缺失时,我们考虑半参数部分线性模型框架下的问题。如果添加部分线性模型的正确模型结构可供使用,我们提议使用一种新的估算方法,即“部分替换光学估计”(PRIME),这种方法可以克服部分线性模型中不完整数据造成的问题。此外,我们利用PRIME与平均模型(PRIME-MA)一起研究部分线性模型中未知模型结构的问题。在模拟研究中,我们使用各种错误分布、样本大小、缺失数据率、共变式相关关系和噪音水平,而且几乎在所有案例中,PRIME都比其他方法都好。由于模型结构不明,PRIME-MA在预测方面表现令人满意,但比Pima印地人糖尿病数据略差一点。我们研究了一些有影响的因素,这表明我们的方法比其他模型要好。