Missing data are frequently encountered in various disciplines and can be divided into three categories: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Valid statistical approaches to missing data depend crucially on correct identification of the underlying missingness mechanism. Although the problem of testing whether this mechanism is MCAR or MAR has been extensively studied, there has been very little research on testing MAR versus MNAR.A critical challenge that is faced when dealing with this problem is the issue of model identification under MNAR. In this paper, under a logistic model for the missing probability, we develop two score tests for the problem of whether the missingness mechanism is MAR or MNAR under a parametric model and a semiparametric location model on the regression function. The score tests require only parameter estimation under the null MAR assumption, which completely circumvents the identification issue. Our simulations and analysis of human immunodeficiency virus data show that the score tests have well-controlled type I errors and desirable powers.
翻译:失踪数据在不同的学科中经常遇到,可以分为三类:完全随机失踪(MCAR),随机失踪(MAR),随机失踪(MAR),非随机失踪(MNAR)。对失踪数据的有效统计方法关键取决于正确识别基本失踪机制。虽然测试这一机制是MCAR还是MAR的问题已经进行了广泛研究,但在测试MAR还是MAR方面却很少进行研究。 处理这一问题时所面临的一个关键挑战就是MNAR的模型识别问题。在本文中,根据一个缺失概率的后勤模型,我们为失踪机制是否属于参数模型下的MAR或MAR的问题和回归函数上的半参数定位模型制定了两个得分测试。得分测试只需要在无效的MAR假设下进行参数估计,这完全回避了识别问题。我们对人体免疫机能丧失病毒数据的模拟和分析表明,得分测试的I型错误和适当能力受到良好控制。