The regression discontinuity (RD) design is widely used for program evaluation with observational data. The primary focus of the existing literature has been the estimation of the local average treatment effect at the existing treatment cutoff. In contrast, we consider policy learning under the RD design. Because the treatment assignment mechanism is deterministic, learning better treatment cutoffs requires extrapolation. We develop a robust optimization approach to finding optimal treatment cutoffs that improve upon the existing ones. We first decompose the expected utility into point-identifiable and unidentifiable components. We then propose an efficient doubly-robust estimator for the identifiable parts. To account for the unidentifiable components, we leverage the existence of multiple cutoffs that are common under the RD design. Specifically, we assume that the heterogeneity in the conditional expectations of potential outcomes across different groups vary smoothly along the running variable. Under this assumption, we minimize the worst case utility loss relative to the status quo policy. The resulting new treatment cutoffs have a safety guarantee that they will not yield a worse overall outcome than the existing cutoffs. Finally, we establish the asymptotic regret bounds for the learned policy using semi-parametric efficiency theory. We apply the proposed methodology to empirical and simulated data sets.
翻译:暂无翻译