This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the cluster; by non-ignorable cluster sizes we mean that "large'' clusters and "small'' clusters may be heterogeneous, and, in particular, the effects of the treatment may vary across clusters of differing sizes. In order to permit this sort of flexibility, we consider a sampling framework in which cluster sizes themselves are random. In this way, our analysis departs from earlier analyses of cluster randomized experiments in which cluster sizes are treated as non-random. We distinguish between two different parameters of interest: the equally-weighted cluster-level average treatment effect, and the size-weighted cluster-level average treatment effect. For each parameter, we provide methods for inference in an asymptotic framework where the number of clusters tends to infinity and treatment is assigned using a covariate-adaptive stratified randomization procedure. We additionally permit the experimenter to sample only a subset of the units within each cluster rather than the entire cluster and demonstrate the implications of such sampling for some commonly used estimators. A small simulation study and empirical demonstration show the practical relevance of our theoretical results.
翻译:本文考虑当群集大小不可忽略时簇随机实验的推断问题。在此,所谓簇随机实验是指以群集为层面分配处理;而非可忽略的群集大小是指“大”群集和“小”群集可能存在异质性,并且特别地,处理效应可能因群集大小而异。为了允许此类灵活性,我们考虑了一个采样框架,在其中簇大小本身是随机的。通过这种方式,我们的分析与早期分析簇随机实验时将群集大小视为非随机的分析不同。我们区分了两个不同的感兴趣的参数:等权群集层平均处理效应和大小加权群集层平均处理效应。对于每个参数,我们提供了渐进性的推断方法,其中群集数量趋近于无限大,并且处理使用协变量自适应分层随机化程序进行分配。此外,我们还允许实验人员仅对每个簇中的单位的子集而非整个簇进行采样,并演示了这种采样对某些常用估计量的影响。一项小型模拟研究和实证演示展示了我们理论结果的实际相关性。