With a categorical treatment D=0,1,...,J, the ubiquitous practice is making dummy variables D(1),...,D(J) to apply the OLS of an outcome Y on D(1),...,D(J) and covariates X. With m(d,X) being the X-heterogeneous effect of D(d) given X, this paper shows that, for "saturated models", the OLS D(d) slope is consistent for a sum of weighted averages of m(1,X),...,m(J,X) where the sum of the weights for m(d,X) is one whereas the sum of the weights for the other X-heterogeneous effects is zero. Hence, if all m(1,X),...,m(J,X) are constant with m(d,X)=b(d), then the OLS D(d) slope is consistent for b(d); otherwise, the OLS is inconsistent in saturated models, as heterogeneous effects of other categories "interfere". For unsaturated models, in general, OLS is inconsistent even for binary D. What can be done instead is the OLS of Y on D(d)-E{D(d)|X, D=0,d} using only the subsample D=0,d to find the effect of D(d) separately for each d=1,...,J. This subsample OLS is consistent for the "overlap-weight" average of m(d,X). Although we parametrize E{D(d)|X, D=0,d} for practicality, using Y-E(Y|X, D=0,d) or its variation instead of Y makes the OLS robust to misspecifications in E{D(d)|X, D=0,d}.
翻译:暂无翻译