基于控制实验核心密度估计值的共变平衡 (Covariate Balancing Based on Kernel Density Estimates for Controlled Experiments)

Controlled experiments are widely used in many applications to investigate the causal relationship between input factors and experimental outcomes. A completely randomized design is usually used to randomly assign treatment levels to experimental units. When covariates of the experimental units are available, the experimental design should achieve covariate balancing among the treatment groups, such that the statistical inference of the treatment effects is not confounded with any possible effects of covariates. However, covariate imbalance often exists, because the experiment is carried out based on a single realization of the complete randomization. It is more likely to occur and worsen when the size of the experimental units is small or moderate. In this paper, we introduce a new covariate balancing criterion, which measures the differences between kernel density estimates of the covariates of treatment groups. To achieve covariate balance before the treatments are randomly assigned, we partition the experimental units by minimizing the criterion, then randomly assign the treatment levels to the partitioned groups. Through numerical examples, we show that the proposed partition approach can improve the accuracy of the difference-in-mean estimator and outperforms the complete randomization and rerandomization approaches.

翻译：在许多应用中广泛使用受控制的实验来调查输入因素和实验结果之间的因果关系。完全随机化的设计通常用来随机地将处理水平分配给实验单位。当实验单位有共变时, 实验设计应该实现处理组之间的共变平衡, 这样处理效果的统计推论不会与共变效应的任何可能效果混为一谈。但是, 共变不平衡经常存在, 因为实验是在完全随机化的单一认识基础上进行的。当实验单位大小小或中等时, 更可能发生和恶化。在本文中, 我们引入一种新的共变平衡标准, 用来衡量处理组共变数的内核密度估计之间的差异。要在随机分配处理之前实现共变平衡, 我们通过尽可能减少标准, 然后随机地将处理水平分配给分区组。我们通过数字示例表明, 拟议的分置法可以提高差异内估量器的精确度, 并超越完全随机化和重新配置方法。