This article addresses the question of reporting a lower confidence band (LCB) for optimal welfare in a policy learning problem. A straightforward procedure inverts a one-sided t-test based on an efficient estimator of the optimal welfare. We show that under empirically relevant data-generating processes, this procedure can be dominated by an LCB corresponding to suboptimal welfare, with the average difference of the order N-1/2. We relate the first-order dominance result to a lack of uniformity in the margin assumption, a standard sufficient condition for debiased inference on the optimal welfare ensuring that the first-best policy is well-separated from the suboptimal ones. Finally, we show that inverting the existing tests from the moment inequality literature produces LCBs that are robust to the non-uniqueness of the optimal policy and easy to compute. We find that this approach performs well empirically in the context of the National JTPA study.
翻译:暂无翻译