Causal inference in a program evaluation setting faces the problem of external validity when the treatment effect in the target population is different from the treatment effect identified from the population of which the sample is representative. This paper focuses on a situation where such discrepancy arises by a stratified sampling design based on the individual treatment status and other characteristics. In such settings, the design probability is known from the sampling design but the target population depends on the underlying population share vector which is often unknown, and except for special cases, the treatment effect parameters are not identified. In this paper, we propose a method of constructing confidence sets that are valid for a given range of population shares. When a benchmark population share vector and a corresponding estimator of a treatment effect parameter are given, we develop a method to discover the scope of external validity with familywise error rate control. Finally, we derive an optimal sampling design which minimizes the semiparametric efficiency bound given a population share associated with a target population. We provide Monte Carlo simulation results and an empirical application to demonstrate the usefulness of our proposals.
翻译:在方案评价环境中,如果目标人口的治疗效果与抽样所代表人口的治疗效果不同,就面临外部有效性问题。本文件着重论述基于个人治疗状况和其他特征的分层抽样设计产生这种差异的情况。在这种环境下,从抽样设计中知道设计概率,但目标人口取决于通常未知的潜在人口比例矢量,除特殊情况外,没有确定治疗效果参数。在本文件中,我们提出一种方法,用以构建对特定范围人口份额有效的信任套数。在给出基准人口比例矢量和相应的治疗效果参数估计数据时,我们开发一种方法,以家庭误差率控制为根据,发现外部有效性的范围。最后,我们得出一种最佳的抽样设计,将半对称效率限制在与目标人口有关的人口比例上。我们提供了蒙特卡洛模拟结果和经验应用,以证明我们提案的有用性。