We examine the performance of efficient and AIPW estimators under two-phase sampling when the complete-data model is nearly correctly specified, in the sense that the misspecification is not reliably detectable from the data by any possible diagnostic or test. Asymptotic results for these nearly true models are obtained by representing them as sequences of misspecified models that are mutually contiguous with a correctly specified model. We find that for the least-favourable direction of model misspecification the bias in the efficient estimator induced can be comparable to the extra variability in the AIPW estimator, so that the mean squared error of the efficient estimator is no longer lower. This can happen when the most-powerful test for the model misspecification still has modest power. We verify that the theoretical results agree with simulation in three examples: a simple informative-sampling model for a Normal mean, logistic regression in the classical case-control design, and linear regression in a two-phase design.
翻译:我们检查了在两阶段抽样中,当完整数据模型几乎被正确指定时,高效和AIPW估计器的性能,也就是说,任何可能的诊断或测试都无法可靠地从数据中检测出错误的具体性能;这些几乎真实的模型的无症状结果是作为与正确指定的模型相连接的错误的模型序列来代表的;我们发现,对于模型偏差的最不利的方向,引致的有效估计器的偏差可与AIPW估计器的额外变异性相比,这样,高效估计器的平均正方形错误就不再低了。当模型误差最有力的测试能力仍然微弱时,这种情况就会发生。我们核实,理论结果与三个例子的模拟一致:典型案件控制设计中的简单信息抽样模型,普通的逻辑回归,以及两阶段设计的线性回归。