Statistical insignificance does not suggest the absence of effect, yet scientists must often use null results as evidence of negligible (near-zero) effect size to falsify scientific hypotheses. Doing so must assess a result's null strength, defined as the evidence for a negligible effect size. Such an assessment would differentiate strong null results that suggest a negligible effect size from weak null results that suggest a broad range of potential effect sizes. We propose the most difference in means ($\delta_M$) as a two-sample statistic that can both quantify null strength and perform a hypothesis test for negligible effect size. To facilitate consensus when interpreting results, our statistic allows scientists to conclude that a result has negligible effect size using different thresholds with no recalculation required. To assist with selecting a threshold, $\delta_M$ can also compare null strength between related results. Both $\delta_M$ and the relative form of $\delta_M$ outperform other candidate statistics in comparing null strength. We compile broadly related results and use the relative $\delta_M$ to compare null strength across different treatments, measurement methods, and experiment models. Reporting the relative $\delta_M$ may provide a technical solution to the file drawer problem by encouraging the publication of null and near-zero results.
翻译:统计意义并不表示没有效果,但科学家们必须经常使用无效结果作为可忽略(近零)效应大小的证据来伪造科学假设。 这样做必须评估结果的无效大小, 被界定为可忽略效应大小的证据。 这样的评估可以区分强烈的无效结果, 表明影响大小微不足道, 和表明广泛潜在影响大小的无效结果。 我们建议用手段上的最大差异( delta_ M$)作为双层统计, 既可以量化无效强度,也可以进行微小效应大小的假设测试。 为了在解释结果时促进共识, 我们的统计允许科学家们用不同的阈值而无需重新计算来断定结果的微小影响大小。 为了帮助选择一个阈值, $\delta_ M$ 也可以将相关结果的无效大小进行比较。 美元和美元=delta_ M$ 相对形式都比得上其他候选人统计数字的无效。 我们汇编了与广泛相关的结果, 并使用相对的 $\delta_ M$ 来比较不同处理、 测量方法的无效效力, 和试验模型 提供无结果。