Two-phase outcome dependent sampling (ODS) is widely used in many fields, especially when certain covariates are expensive and/or difficult to measure. For two-phase ODS, the conditional maximum likelihood (CML) method is very attractive because it can handle zero Phase 2 selection probabilities and avoids modeling the covariate distribution. However, most existing CML-based methods use only the Phase 2 sample and thus may be less efficient than other methods. We propose a general empirical likelihood method that uses CML augmented with additional information in the whole Phase 1 sample to improve estimation efficiency. The proposed method maintains the ability to handle zero selection probabilities and avoids modeling the covariate distribution, but can lead to substantial efficiency gains over CML in the inexpensive covariates, or in the influential covariate when a surrogate is available, because of an effective use of the Phase 1 data. Simulations and a real data illustration using NHANES data are presented.
翻译:在许多领域广泛使用基于两阶段结果的取样方法(ODS),特别是在某些共变方法费用昂贵和/或难以测量的情况下。对于两阶段的ODS来说,有条件的最大可能性(CML)方法非常有吸引力,因为它能够处理第2阶段零的选择概率,避免模拟共变分布;然而,大多数基于CML的现有方法只使用第2阶段的样本,因此可能不如其他方法有效。我们提出了一个一般性的经验可能性方法,即使用CML,在整个第1阶段的样本中增加补充信息,以提高估算效率。拟议方法保持了处理零选择概率的能力,避免了共变分布的模型,但可以导致在廉价的共变式中或由于有效使用第1阶段数据而在有影响力的共变法中,在可提供替代数据时,在CML上大大提高效率。我们介绍了利用NHAES数据的模拟和真实数据说明。