We investigate the optimal design of experimental studies that have pre-treatment outcome data available. The average treatment effect is estimated as the difference between the weighted average outcomes of the treated and control units. A number of commonly used approaches fit this formulation, including the difference-in-means estimator and a variety of synthetic-control techniques. We propose several methods for choosing the set of treated units in conjunction with the weights. Observing the NP-hardness of the problem, we introduce a mixed-integer programming formulation which selects both the treatment and control sets and unit weightings. We prove that these proposed approaches lead to qualitatively different experimental units being selected for treatment. We use simulations based on publicly available data from the US Bureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials.
翻译:我们调查了具有预处理结果数据的实验性研究的最佳设计,估计平均处理效果是被处理和控制单位加权平均结果之间的差别。一些常用方法适合这一配方,包括中值估测器和各种合成控制技术。我们建议了几种方法,结合重量来选择一组经处理的单位。观察问题的NP-硬度,我们采用混合数编程配方,既选择治疗和控制组,又选择单位加权数。我们证明这些拟议方法导致选择了质量上不同的实验单位。我们使用根据美国劳工统计局公开数据进行的模拟,这些模拟显示,与随机试验等简单和常用的替代方法相比,在平均平方差和统计能力方面有所改进。