Since its debut in the 18th century, the P-value has been an important part of hypothesis testing-based scientific discoveries. As the statistical engine accelerates, questions are beginning to be raised, asking to what extent scientific discoveries based on P-values are reliable and reproducible, and the voice calling for adjusting the significance level or banning the P-value has been increasingly heard. Inspired by these questions and discussions, here we enquire into the useful roles and misuses of the P-value in scientific studies. For common misuses and misinterpretations, we provide modest recommendations for practitioners. Additionally, we compare statistical significance with clinical relevance. In parallel, we review the Bayesian alternatives for seeking evidence. Finally, we discuss the promises and risks of using meta-analysis to pool P-values from multiple studies to aggregate evidence. Taken together, the P-value underpins a useful probabilistic decision-making system and provides evidence at a continuous scale. But its interpretation must be contextual, considering the scientific question, experimental design (including the model specification, sample size, and significance level), statistical power, effect size, and reproducibility.
翻译:P值自18世纪首次出现以来,一直是基于假设测试的科学发现的一个重要部分。随着统计引擎的加速,人们开始提出一些问题,询问基于P值的科学发现在多大程度上是可靠和可复制的,呼吁调整重要性或禁止P值的声音日益被听到。受这些问题和讨论的启发,我们在这里询问P值在科学研究中的有用作用和滥用。对于常见的误用和误解,我们为实践者提供的建议不多。此外,我们比较统计意义和临床相关性。与此同时,我们审查巴伊西亚的替代方法以寻找证据。最后,我们讨论了利用元分析将多类研究中的P值汇集到综合证据中的许诺和风险。综合起来,P值是有用的概率决策系统的基础,并持续提供证据。但是,P值的解释必须是符合背景的,考虑到科学问题、实验设计(包括模型规格、样本大小和重要性)、统计能力、效果大小和可追溯性。