Researchers in many fields endeavor to estimate treatment effects by regressing outcome data (Y) on a treatment (D) and observed confounders (X). Even absent unobserved confounding, the regression coefficient on the treatment reports a weighted average of strata-specific treatment effects (Angrist, 1998). Where heterogeneous treatment effects cannot be ruled out, the resulting coefficient is thus not generally equal to the average treatment effect (ATE), and is unlikely to be the quantity of direct scientific or policy interest. The difference between the coefficient and the ATE has led researchers to propose various interpretational, bounding, and diagnostic aids (Humphreys, 2009; Aronow and Samii, 2016; Sloczynski, 2022; Chattopadhyay and Zubizarreta, 2023). We note that the linear regression of Y on D and X can be misspecified when the treatment effect is heterogeneous in X. The "weights of regression", for which we provide a new (more general) expression, simply characterize how the OLS coefficient will depart from the ATE under the misspecification resulting from unmodeled treatment effect heterogeneity. Consequently, a natural alternative to suffering these weights is to address the misspecification that gives rise to them. For investigators committed to linear approaches, we propose relying on the slightly weaker assumption that the potential outcomes are linear in X. Numerous well-known estimators are unbiased for the ATE under this assumption, namely regression-imputation/g-computation/T-learner, regression with an interaction of the treatment and covariates (Lin, 2013), and balancing weights. Any of these approaches avoid the apparent weighting problem of the misspecified linear regression, at an efficiency cost that will be small when there are few covariates relative to sample size. We demonstrate these lessons using simulations in observational and experimental settings.
翻译:暂无翻译