Cluster randomized trials (CRTs) often enroll large numbers of participants, but due to logistical and fiscal challenges, only a subset of participants may be selected for measurement of certain outcomes, and those sampled may, purposely or not, be unrepresentative of all participants. Missing data also present a challenge: if sampled individuals with measured outcomes are dissimilar from those with missing outcomes, unadjusted estimates of arm-specific outcomes and the intervention effect may be biased. Further, CRTs often enroll and randomize few clusters by necessity, limiting statistical power and raising concerns about finite sample performance. Motivated by a sub-study of the SEARCH community randomized trial on the incidence of TB infection, we demonstrate interlocking methods to handle these challenges. First, we extend Two-Stage targeted minimum loss-based estimation (TMLE) to account for three sources of missingness: (1) sampling for the sub-study; (2) measurement of baseline status among those sampled, and (3) measurement of final status among those in the incidence cohort (i.e., persons known to be at risk at baseline). Second, we critically evaluate the assumptions under which sub-units of the cluster can be considered the conditionally independent unit, improving precision and statistical power but also causing the CRT to behave more like an observational study. Our application to the SEARCH highlights the impact of different assumptions on measurement and dependence as well as the real-life gains of our approach for bias reduction and efficiency improvement.
翻译:集群随机试验(CRTs)往往招录大量参与者,但由于后勤和财政方面的挑战,只有一组参与者可以选择来衡量某些结果,抽样的参与者可能有意或不有意地不代表所有参与者。 缺少的数据也提出了挑战:如果抽样者得出衡量结果的人与缺乏结果的人不同,未经调整的对具体武器结果和干预效果的估计可能存在偏差。此外,分类调查者往往按需要将少数组群招录和随机抽查,限制统计能力,引起对有限抽样业绩的关切。受SEARCH社区随机调查结核病感染发生率的次研究的驱动,我们展示了应对这些挑战的交叉方法。首先,我们扩大基于两套目标的最低损失估计(TMLE),以说明三个缺失来源:(1) 抽样研究的抽样;(2) 衡量抽样者中的基线状况,以及(3) 衡量发生率组群群(即已知处于风险的人)的最后状况。第二,我们严格评价了如何根据何种假设,以更精确的方式进行分组评估,例如,如何以更精确的统计性测算。