Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive effectiveness among 82,957 women living with HIV whose data were originally extracted from electronic medical records, of whom 4855 had their charts reviewed, and a subsequent 1203 also had a telephone interview to validate key study variables.
翻译:在存在易出错数据的情况下,往往利用验证研究来获取更可靠的信息。关于子主题抽样的经验证数据,可以与关于所有主题的易出错数据一起使用,以改进估计;在实践中,可能需要进行一轮以上的数据验证,直接采用标准方法将验证数据纳入分析可能会导致低效估计数据,因为从中间验证步骤获得的信息只是部分考虑,甚至被完全忽略。在本文件中,我们提供了两个新颖的扩展,即多个估算和通用估计数据,充分利用了所有可用数据。我们通过模拟显示,纳入中间步骤的信息可以大大提高效率。这项工作的动机和说明是,对82 957名感染艾滋病毒的妇女进行了避孕效果研究,其数据最初是从电子医疗记录中提取的,其中4855人对其病历进行了审查,随后的1203人还进行了电话访谈,以验证关键研究变量。