For randomized controlled trials (RCTs) with a single intervention being measured on multiple outcomes, researchers often apply a multiple testing procedure (such as Bonferroni or Benjamini-Hochberg) to adjust $p$-values. Such an adjustment reduces the likelihood of spurious findings, but also changes the statistical power, sometimes substantially, which reduces the probability of detecting effects when they do exist. However, this consideration is frequently ignored in typical power analyses, as existing tools do not easily accommodate the use of multiple testing procedures. We introduce the PUMP R package as a tool for analysts to estimate statistical power, minimum detectable effect size, and sample size requirements for multi-level RCTs with multiple outcomes. Multiple outcomes are accounted for in two ways. First, power estimates from PUMP properly account for the adjustment in $p$-values from applying a multiple testing procedure. Second, as researchers change their focus from one outcome to multiple outcomes, different definitions of statistical power emerge. PUMP allows researchers to consider a variety of definitions of power, as some may be more appropriate for the goals of their study. The package estimates power for frequentist multi-level mixed effects models, and supports a variety of commonly-used RCT designs and models and multiple testing procedures. In addition to the main functionality of estimating power, minimum detectable effect size, and sample size requirements, the package allows the user to easily explore sensitivity of these quantities to changes in underlying assumptions.
翻译:对随机控制试验(RCTs)进行随机控制试验(RCTs)时,对多种结果进行单一的衡量,研究人员往往采用多种测试程序(如Bonferroni或Benjami-Hochberg)来调整美元价值。这种调整降低了假发现的可能性,但有时还大大改变了统计力量,从而降低了在存在多种测试程序时发现影响的可能性。然而,在典型的权力分析中,这种考虑经常被忽略,因为现有工具不易适应多种测试程序的使用。我们引入PUMP R 套件,作为分析员评估具有多种结果的多级RCT的统计力量、最低可检测效果大小和抽样规模要求的工具。多种结果的多重结果说明。首先,PUMP的电力估计数适当说明在应用多重测试程序时以美元计算的调整。第二,随着研究人员将重点从一个结果转向多重结果,统计能力定义也出现差异。PUMP允许研究人员考虑各种权力定义,因为有些可能更适合其研究的目标。在经常使用的多级RCT模型中,对经常使用的多级模型和多级试算模型的模型中的最低影响进行了一揽子测试,从而可以检测。