连接简单而精确的 P 值与复杂而模棱两可的现实 (Connecting Simple and Precise P-values to Complex and Ambiguous Realities)

Mathematics is a limited component of solutions to real-world problems, as it expresses only what is expected to be true if all our assumptions are correct, including implicit assumptions that are omnipresent and often incorrect. Statistical methods are rife with implicit assumptions whose violation can be life-threatening when results from them are used to set policy. Among them are that there is human equipoise or unbiasedness in data generation, management, analysis, and reporting. These assumptions correspond to levels of cooperation, competence, neutrality, and integrity that are absent more often than we would like to believe. Given this harsh reality, we should ask what meaning, if any, we can assign to the P-values, 'statistical significance' declarations, 'confidence' intervals, and posterior probabilities that are used to decide what and how to present (or spin) discussions of analyzed data. By themselves, P-values and CI do not test any hypothesis, nor do they measure the significance of results or the confidence we should have in them. The sense otherwise is an ongoing cultural error perpetuated by large segments of the statistical and research community via misleading terminology. So-called 'inferential' statistics can only become contextually interpretable when derived explicitly from causal stories about the real data generator (such as randomization), and can only become reliable when those stories are based on valid and public documentation of the physical mechanisms that generated the data. Absent these assurances, traditional interpretations of statistical results become pernicious fictions that need to be replaced by far more circumspect descriptions of data and model relations.

翻译：---- 数学是解决现实问题的有限部分，因为它仅表达了如果我们所有的假设都正确（包括无处不在且经常不正确的隐含假设），则预期成立的内容。统计方法中包含着大量的隐含假设，当这些方法的结果被用于制定政策时，假设的违反可能导致危及生命。其中一些假设是数据产生、管理、分析和报告中存在人性均衡或无偏的假设，这些假设对应着缺席更为常见的协作、能力、客观性和诚实。考虑到这种残酷现实，我们应该问，我们可以给予 P 值、‘统计显著性’声明、‘置信’区间和后验概率以何种意义，如果有的话，这些量被用于决定分析数据的内容和方式以及如何呈现（或解读）这些结果。单独看，P 值和置信区间不会测试任何假设，也不会测量结果的显著性或我们对其的置信度。相反，这种感觉是一种持续的文化误区，通过误导性术语被广大的统计和研究社区维持着。所谓“推断式”统计只有在明确从真实数据生成器（如随机化）中派生的有关因果故事时才能在上下文中被解释，并且只有在这些故事基于数据生成的物理机制的有效和公开的文档时才能变得可靠。如果缺乏这些保证，统计结果的传统解释就会变成有害的虚构，需要用更为慎重的数据和模型关系描述来替代。