When designing a randomized experiment, one way to ensure treatment and control groups exhibit similar covariate distributions is to randomize treatment until some prespecified level of covariate balance is satisfied; this strategy is known as rerandomization. Most rerandomization methods utilize balance metrics based on a quadratic form $\mathbf{v}^T \mathbf{A} \mathbf{v}$, where $\mathbf{v}$ is a vector of covariate mean differences and $\mathbf{A}$ is a positive semi-definite matrix. In this work, we derive general results for treatment-versus-control rerandomization schemes that employ quadratic forms for covariate balance. In addition to allowing researchers to quickly derive properties of rerandomization schemes not previously considered, our theoretical results provide guidance on how to choose $\mathbf{A}$ in practice. We find the Mahalanobis and Euclidean distances optimize different measures of covariate balance. Furthermore, we establish how the covariates' eigenstructure and their relationship to the outcomes dictates which matrix $\mathbf{A}$ yields the most precise difference-in-means estimator for the average treatment effect. We find the Euclidean distance is minimax optimal, in the sense that the difference-in-means estimator's precision is never too far from the optimal choice. We verify our theoretical results via simulation and a real data application, and demonstrate how the choice of $\mathbf{A}$ impacts the variance reduction of rerandomized experiments.
翻译:在设计随机实验时,确保处理组与对照组呈现相似协变量分布的一种方法是:持续随机分配处理直至达到预设的协变量平衡水平;该策略称为重随机化。多数重随机化方法采用基于二次型 $\mathbf{v}^T \mathbf{A} \mathbf{v}$ 的平衡度量,其中 $\mathbf{v}$ 为协变量均值差异向量,$\mathbf{A}$ 为半正定矩阵。本研究针对采用二次型进行协变量平衡的处理-对照重随机化方案,推导出一般性结论。除帮助研究者快速推导以往未考虑的重随机化方案性质外,我们的理论结果还为实践中如何选择 $\mathbf{A}$ 提供指导。研究发现马氏距离与欧氏距离分别优化了不同的协变量平衡度量。进一步地,我们建立了协变量的特征结构及其与结果变量之间的关系如何决定矩阵 $\mathbf{A}$ 的选择,以使得平均处理效应的均值差分估计量达到最高精度。研究证明欧氏距离具有极小极大最优性,即均值差分估计量的精度永远不会偏离最优选择过远。我们通过模拟实验与真实数据应用验证了理论结果,并展示了 $\mathbf{A}$ 的选择如何影响重随机化实验的方差缩减效果。