Missing data is a common challenge in biomedical research. This fact, along with growing dataset volumes of the modern era, make the issue of computationally-efficient analysis with missing data of crucial practical importance. A general computationally-efficient estimation framework for dealing with missing data is the pseudo-expected estimating equations (PEEE) approach. The method is applicable with any parametric model for which estimation involves the solution of a set of estimating equations, such as likelihood score equations. A key limitation of the PEEE approach is that there is currently no closed-form variance estimator, and variance estimation requires the computationally burdensome bootstrap method. In this work, we address the gap and provide a closed-form variance estimator whose computation can be significantly faster than a bootstrap approach. Our variance estimator is shown to be consistent even with auxiliary variables and under misspecified models for the incomplete variables. Simulation studies show that our variance estimator performs well and that its computation can be over 50 times faster than the bootstrap. The computational efficiency gain from our proposed variance estimator is crucial with large datasets or when the main analysis method is computationally intensive. Finally, the PEEE approach along with our variance estimator are used to analyze incomplete electronic health record data of patients with traumatic brain injury.
翻译:缺少的数据是生物医学研究中常见的一个挑战。这一事实加上现代时代越来越多的数据集数量,使得计算效率分析的问题与缺少的数据具有至关重要的实际重要性。处理缺失数据的一般计算效率估计框架是假预期估计方程(PEEEE)方法。这种方法适用于任何参数模型,其估计涉及一套估计方程的解决方案,如概率分数方程。PEEE方法的一个关键局限性是,目前没有封闭式差异估计器,而差异估计需要计算繁琐的螺旋杆方法。在这项工作中,我们处理差距,提供一种封闭式差异估计标准,其计算速度大大快于靴子方法。我们的差异估计器显示,即使与辅助变量和不完全变量的错误定型模型一致。模拟研究表明,我们的差异估计器的运行速度比靴形估计器要快50倍以上。从我们提议的差异估计器获得的计算效率,对于大型数据设置至关重要,或者当我们的主要分析器使用脑损伤分析法时,我们的主分析法与不完全的脑损伤分析方法一起进行。