The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the observed data. {Under this setting, we surprisingly observed that the optimal experimental design problem could be reduced to a so-called \textit{phase synchronization} problem.} We solve this problem via a normalized variant of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design when pre-treatment data is sampled from certain data-generating processes. Empirically, we conduct extensive experiments to demonstrate the effectiveness of our method on both the US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean square error, our algorithm surpasses the random design by a large margin.
翻译:最佳实验设计通常涉及解决NP硬组合优化问题。 在本文中, 我们的目标是开发一种全球趋同且实际有效的优化算法。 具体地说, 我们考虑一个可以提供预处理结果数据和合成控制估计器的设置。 平均处理效果是通过被处理和控制单位加权平均结果之间的差别估计的, 其重量是从观察到的数据中学习的。 { 在这种背景下, 我们惊讶地发现, 最佳实验设计问题可以降为所谓的\ textit{ 阶段同步} 问题 } 我们通过光谱初始化的通用电法的常规变体来解决这个问题。 在理论方面, 当预处理数据从某些数据生成过程中抽样时, 我们为实验设计建立了第一个全球最佳性保证。 我们随机地进行了广泛的实验, 以证明我们的方法在美国劳工统计局和Abadie- Diemond- Hainmueller Califraphia烟雾数据上的有效性。 。 根正方错误中, 我们的算法大大超越了随机设计。