Valid statistical inference is challenging when the sample is subject to unknown selection bias. Data integration can be used to correct for selection bias when we have a parallel probability sample from the same population with some common measurements. How to model and estimate the selection probability or the propensity score (PS) of a non-probability sample using an independent probability sample is the challenging part of the data integration. We approach this difficult problem by employing multiple candidate models for PS combined with empirical likelihood. By incorporating multiple propensity score models into the internal bias calibration constraint in the empirical likelihood setup, the selection bias can be eliminated so long as the multiple candidate models contain a true PS model. The bias calibration constraint under the multiple PS models is called multiple bias calibration. Multiple PS models can include both missing-at-random and missing-not-at-random models. Asymptotic properties are discussed, and some limited simulation studies are presented to compare the proposed method with some existing competitors. Plasmode simulation studies using the Culture \& Community in a Time of Crisis dataset demonstrate the practical usage and advantages of the proposed method.
翻译:暂无翻译