To overcome challenges in fitting complex models with small samples, catalytic priors have recently been proposed to stabilize the inference by supplementing observed data with synthetic data generated from simpler models. Based on a catalytic prior, the Maximum A Posteriori (MAP) estimator is a regularized estimator that maximizes the weighted likelihood of the combined data. This estimator is straightforward to compute, and its numerical performance is superior or comparable to other likelihood-based estimators. In this paper, we study several theoretical aspects regarding the MAP estimator in generalized linear models, with a particular focus on logistic regression. We first prove that under mild conditions, the MAP estimator exists and is stable against the randomness in synthetic data. We then establish the consistency of the MAP estimator when the dimension of covariates diverges slower than the sample size. Furthermore, we utilize the convex Gaussian min-max theorem to characterize the asymptotic behavior of the MAP estimator as the dimension grows linearly with the sample size. These theoretical results clarify the role of the tuning parameters in a catalytic prior, and provide insights in practical applications. We provide numerical studies to confirm the effective approximation of our asymptotic theory in finite samples and to illustrate adjusting inference based on the theory.
翻译:暂无翻译