In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabeled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called Prediction De-Correlated Inference (PDC). Our approach is safe, in the sense that PDC can automatically adapt to any black-box machine-learning model and consistently outperform the supervised counterparts. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis demonstrate the superiority of PDC over the state-of-the-art methods.
翻译:暂无翻译