Prediction-based (PB) inference is increasingly used in applications where the outcome of interest is difficult to obtain, but its predictors are readily available. Unlike traditional inference, PB inference performs statistical inference using a partially observed outcome and a set of covariates by leveraging a prediction of the outcome generated from a machine learning (ML) model. Motwani and Witten (2023) recently revisited two innovative PB inference approaches for ordinary least squares. They found that the method proposed by Wang et al. (2020) yields a consistent estimator for the association of interest when the ML model perfectly captures the underlying regression function. Conversely, the prediction-powered inference (PPI) method proposed by Angelopoulos et al. (2023) yields valid inference regardless of the model's accuracy. In this paper, we study the statistical efficiency of the PPI estimator. Our analysis reveals that a more efficient estimator, proposed 25 years ago by Chen and Chen (2000), can be obtained by simply adding a weight to the PPI estimator. We also contextualize PB inference with methods from the economics and statistics literature dating back to the 1960s. Our extensive theoretical and numerical analyses indicate that the Chen and Chen (CC) estimator offers a balance between robustness to ML model specification and statistical efficiency, making it the preferred choice for use in practice.
翻译:暂无翻译