Researchers have long run regressions of an outcome variable (Y) on a treatment (D) and covariates (X) to estimate treatment effects. Even absent unobserved confounding, the regression coefficient on D in this setup reports a conditional variance weighted average of strata-wise average effects, not generally equal to the average treatment effect (ATE). Numerous proposals have been offered to cope with this "weighting problem", including interpretational tools to help characterize the weights and diagnostic aids to help researchers assess the potential severity of this problem. We make two contributions that together suggest an alternative direction for researchers and this literature. Our first contribution is conceptual, demystifying these weights. Simply put, under heterogeneous treatment effects (and varying probability of treatment), the linear regression of Y on D and X will be misspecified. The "weights" of regression offer one characterization for the coefficient from regression that helps to clarify how it will depart from the ATE. We also derive a more general expression for the weights than what is usually referenced. Our second contribution is practical: as these weights simply characterize misspecification bias, we suggest simply avoiding them through an approach that tolerate heterogeneous effects. A wide range of longstanding alternatives (regression-imputation/g-computation, interacted regression, and balancing weights) relax specification assumptions to allow heterogeneous effects. We make explicit the assumption of "separate linearity", under which each potential outcome is separately linear in X. This relaxation of conventional linearity offers a common justification for all of these methods and avoids the weighting problem, at an efficiency cost that will be small when there are few covariates relative to sample size.
翻译:暂无翻译