研究基于协变量自适应随机化的两阶段实验推断 (Inference for Two-stage Experiments under Covariate-Adaptive Randomization)

This paper studies inference in two-stage randomized experiments under covariate-adaptive randomization. Here, by a two-stage randomized experiment, we mean one in which clusters (e.g., households, schools, or graph partitions) are first randomly assigned to different levels of treated fraction, followed by random assignment of units within each treated cluster to either treatment or control based on the selected treated fraction; by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve "balance" within each stratum. We examine estimation and inference of such designs under two different asymptotic regimes, namely, "small strata" (e.g., matched-pair designs) and "large strata" (e.g., stratified block randomization). Our analysis of these two regimes enables us to study a broad range of commonly used designs from the empirical literature. We establish conditions under which our estimators are consistent and asymptotically normal and develop consistent estimators of their corresponding asymptotic variances. Combining these results establishes the asymptotic validity of tests based on these estimators. We argue that ignoring covariate information in the design stage can result in efficiency loss, and commonly used inference methods that ignore or improperly use covariate information can lead to either conservative or invalid inference. Then, we apply our results to studying optimal use of covariate information under covariate-adaptive randomization in large samples, and show that a certain generalized matched-pair design achieves minimum asymptotic variance for each proposed estimator. A simulation study and empirical application confirm the practical relevance of our theoretical results.

翻译：本文研究基于协变量自适应随机化的两阶段随机实验推断。在这里，两阶段随机实验是指首先将群集（例如，家庭、学校或图形分区）随机分配到不同的受治疗率水平，然后根据选择的受治疗率，在每个受治疗群集内随机分配单位到治疗组或对照组；协变量自适应随机化是指根据基线协变量首先分层，然后分配治疗状态以实现每个分层内的“平衡”的随机化方案。我们在两个不同的渐近区域下，即“小分层”（例如，匹配对设计）和“大分层”（例如，分层随机区组设计）下研究了这种设计的估计和推断。我们对这两个渐近区域的分析使我们能够研究经验文献中广泛使用的设计范围。我们建立了这些估计量一致和渐近正态的条件，并开发了相应渐近方差的一致估计量。将这些结果结合起来，可以确立基于这些估计量的检验的渐近有效性。我们认为，在设计阶段忽略协变量信息可能会导致效率损失，并且忽略或不正确使用协变量信息的常用推断方法可能会导致保守或无效的推断。然后，我们将我们的结果应用于研究大样本下协变量自适应随机化中协变量信息的最佳使用，并表明某种广义匹配对设计实现了每个提议估计值的最小渐近方差。模拟研究和实证应用证实了我们理论结果的实际相关性。