Missing data can lead to inefficiencies and biases in analyses, in particular when data are missing not at random (MNAR). It is thus vital to understand and correctly identify the missing data mechanism. Recovering missing values through a follow up sample allows researchers to conduct hypothesis tests for MNAR, which are not possible when using only the original incomplete data. Investigating how properties of these tests are affected by the follow up sample design is little explored in the literature. Our results provide comprehensive insight into the properties of one such test, based on the commonly used selection model framework. We determine conditions for recovery samples that allow the test to be applied appropriately and effectively, i.e. with known Type I error rates and optimized with respect to power. We thus provide an integrated framework for testing for the presence of MNAR and designing follow up samples in an efficient cost-effective way. The performance of our methodology is evaluated through simulation studies as well as on a real data sample.
翻译:缺乏的数据可能导致效率低,分析偏差,特别是在数据并非随机缺失的情况下(MNAR),因此,了解和正确识别缺失的数据机制至关重要。通过后续抽样恢复缺失的数值,使研究人员能够对MNAR进行假设测试,而仅使用原始不完整数据是不可能的。调查这些测试的特性如何受到后续抽样设计的影响,文献很少探讨。我们的结果根据常用的选择模型框架,全面了解了这类测试的特性。我们确定回收样本的条件,以便适当和有效地进行测试,即使用已知的I型误差率,并优化了能力。因此,我们提供了一个综合框架,用于测试MNAR的存在,并设计低成本高效率的样本。我们的方法绩效通过模拟研究和真实的数据抽样进行评估。