Most scientific disciplines use significance testing to draw conclusions about experimental or observational data. This classical approach provides a theoretical guarantee for controlling the number of false positives across a set of hypothesis tests, making it an appealing framework for scientists seeking to limit the number of false effects or associations that they claim to observe. Unfortunately, this theoretical guarantee applies to few experiments, and the true false positive rate (FPR) is much higher. Scientists have plenty of freedom to choose the error rate to control, the tests to include in the adjustment, and the method of correction, making strong error control difficult to attain. In addition, hypotheses are often tested after finding unexpected relationships or patterns, the data are analysed in several ways, and analyses may be run repeatedly as data accumulate. As a result, adjusted p-values are too small, incorrect conclusions are often reached, and results are harder to reproduce. In the following, I argue why the FPR is rarely controlled meaningfully and why shrinking parameter estimates is preferable to p-value adjustments.
翻译:多数科学学科都使用重要测试来得出实验或观测数据的结论。 这种古典方法为控制一系列假设测试中的假正数数量提供了理论保障,使科学家们能够对试图限制虚假效应或他们声称要观察的关联数的科学家们提供一个吸引人心的框架。 不幸的是,这一理论保障适用于少数的实验,而真实的假正率(FPR)则高得多。 科学家们有充分的自由选择要控制的错误率、在调整中包括的测试以及纠正方法,使得难以实现强大的错误控制。 此外,在发现出乎意料的关系或模式后,往往对假设进行测试,对数据进行分析,并随着数据积累而反复进行分析。 结果,调整的 p-价值太小,往往得出错误的结论,结果更难复制。 在下文中,我争论为什么FPR很少受到有意义的控制,为什么缩缩参数估计数比P-价值调整更可取。</s>