Some applied researchers hesitate to use nonparametric methods, worrying that they will lose power in small samples or overfit the data when simpler models are sufficient. We argue that at least some of these concerns are unfounded when nonparametric models are strongly shrunk towards parametric submodels. We consider expanding a parametric model with a nonparametric component that is heavily shrunk toward zero. This construction allows the model to adapt automatically: if the parametric model is correct, the nonparametric component disappears, recovering parametric efficiency, while if it is misspecified, the flexible component activates to capture the missing signal. We show that this adaptive behavior follows from simple and general conditions. Specifically, we prove that Bayesian nonparametric models anchored to linear regression, including variants of Gaussian processes regression and Bayesian additive regression trees, consistently identify the correct parametric submodel when it holds and give asymptotically efficient inference for regression coefficients. In simulations, we find that the "general BART" model performs identically to correctly specified linear regression when the parametric model holds, and substantially outperform it when nonlinear effects are present. This suggests a practical paradigm: "defensive model expansion" as a safeguard against model misspecification.
翻译:暂无翻译