连接简单与精确的P值与复杂和模糊的现实 (Connecting Simple and Precise P-values to Complex and Ambiguous Realities)

Mathematics is a limited component of solutions to real-world problems, as it expresses only what is expected to be true if all our assumptions are correct, including implicit assumptions that are omnipresent and often incorrect. Statistical methods are rife with implicit assumptions whose violation can be life-threatening when results from them are used to set policy. Among them are that there is human equipoise or unbiasedness in data generation, management, analysis, and reporting. These assumptions correspond to levels of cooperation, competence, neutrality, and integrity that are absent more often than we would like to believe. Given this harsh reality, we should ask what meaning, if any, we can assign to the P-values, 'statistical significance' declarations, 'confidence' intervals, and posterior probabilities that are used to decide what and how to present (or spin) discussions of analyzed data. By themselves, P-values and CI do not test any hypothesis, nor do they measure the significance of results or the confidence we should have in them. The sense otherwise is an ongoing cultural error perpetuated by large segments of the statistical and research community via misleading terminology. So-called 'inferential' statistics can only become contextually interpretable when derived explicitly from causal stories about the real data generator (such as randomization), and can only become reliable when those stories are based on valid and public documentation of the physical mechanisms that generated the data. Absent these assurances, traditional interpretations of statistical results become pernicious fictions that need to be replaced by far more circumspect descriptions of data and model relations.

翻译：数学是解决现实问题的有限组成部分，因为它只表达了如果我们的所有假设都正确，包括无处不在且经常是不正确的隐含假设，则预期为真的内容。统计方法充斥着隐含假设，其违反可能在利用其结果制定政策时带来生命危险。其中包括数据生成、管理、分析和报告中存在人类均衡或无偏性的假设。这些假设对应的是缺席的合作、能力、中立和完整性的水平，这些水平比我们想象的要少。鉴于这种恶劣现实，我们应该问一下，我们可以为何赋予P值、‘统计显著性’、‘置信’区间和后验概率这些用于决定分析数据的内容以及如何呈现（或旋转）的含义（如果有的话）。单独看，P值和CI不测试任何假设，也不测量结果的重要性或我们应该对其有多少信心。感觉否则是通过误导性术语对统计和研究社区的大部分人继续存在的文化错误。所谓的‘推断性’统计只有在明确从关于真实数据生成器的因果形式主义（如随机化）中导出时，才能变得具有上下文可解释性，并且只有当这些故事基于生成数据的物理机制的有效和公开的文档时才能变得可靠。在缺乏这些保证的情况下，统计结果的传统解释变成了需要用更为谨慎的描述数据和模型关系来取而代之的有害虚构。