Null hypothesis statistical significance testing (NHST) is the dominant approach for evaluating results from randomized controlled trials. Whereas NHST comes with long-run error rate guarantees, its main inferential tool -- the $p$-value -- is only an indirect measure of evidence against the null hypothesis. The main reason is that the $p$-value is based on the assumption the null hypothesis is true, whereas the likelihood of the data under any alternative hypothesis is ignored. If the goal is to quantify how much evidence the data provide for or against the null hypothesis it is unavoidable that an alternative hypothesis be specified (Goodman & Royall, 1988). Paradoxes arise when researchers interpret $p$-values as evidence. For instance, results that are surprising under the null may be equally surprising under a plausible alternative hypothesis, such that a $p=.045$ result (`reject the null') does not make the null any less plausible than it was before. Hence, $p$-values have been argued to overestimate the evidence against the null hypothesis. Conversely, it can be the case that statistically non-significant results (i.e., $p>.05)$ nevertheless provide some evidence in favor of the alternative hypothesis. It is therefore crucial for researchers to know when statistical significance and evidence collide, and this requires that a direct measure of evidence is computed and presented alongside the traditional $p$-value.
翻译:假设统计意义测试(NHST)是评估随机控制试验结果的主要方法。尽管NHST具有长期错误率保障,但其主要的推断工具 -- -- 美元价值 -- -- 仅仅是对无效假设的一种间接的证据衡量标准。主要理由是美元价值是基于假设无效的假设,而任何替代假设下数据的可能性都被忽视。如果目标是量化数据提供的证据多少,或根据无效假设提供的证据,则不可避免的是指定替代假设(Goodman & Royall,1988年)。当研究人员将美元价值解释为证据时,会出现悖论。举例来说,在无效假设下令人惊讶的结果同样令人惊讶,例如,美元=0.45美元的结果(“反对无效”)不会使任何替代假设下的数据变得比以前更不可信。因此,美元价值是高估了对无效假设的证据(Goodman & Royol,1988年)。相反,如果研究人员将美元价值解释为非重大结果,那么在统计假设中则需要某种非重要的证据(i),因此,美元价值是直接证据,因此,美元价值是直接证据。