Bayesian假设检验和置信区间: 为什么选择贝叶斯而不是频率派，如何与监管机构制定先验 (Hypothesis testing and confidence sets: why Bayesian not frequentist, and how to set a prior with a regulatory authority)

from arxiv, 142 pages, 72 figures, 14 tables; v6 has corrected figure 60 and new example section 6 illustrating conservation of Shannon information with Bayes and loss with frequentist methods

We marshall the arguments for preferring Bayesian hypothesis testing and confidence sets to frequentist ones. We define admissible solutions to inference problems, noting that Bayesian solutions are admissible. We give seven weaker common-sense criteria for solutions to inference problems, all failed by these frequentist methods but satisfied by any admissible method. We note that pseudo-Bayesian methods made by handicapping Bayesian methods to satisfy criteria on type I error rate makes them frequentist not Bayesian in nature. We give five examples showing the differences between Bayesian and frequentist methods; the first requiring little calculus, the second showing in abstract what is wrong with these frequentist methods, the third to illustrate information conservation, the fourth to show that the same problems arise in everyday statistical problems, and the fifth to illustrate how on some real-life inference problems Bayesian methods require less data than fixed sample-size (resp. pseudo-Bayesian) frequentist hypothesis testing by factors exceeding 3000 (resp 300) without recourse to informative priors. To address the issue of different parties with opposing interests reaching agreement on a prior, we illustrate the beneficial effects of a Bayesian "Let the data decide" policy both on results under a wide variety of conditions and on motivation to reach a common prior by consent. We show that in general the frequentist confidence level contains less relevant Shannon information than the Bayesian posterior, and give an example where no deterministic frequentist critical regions give any relevant information even though the Bayesian posterior contains up to the maximum possible amount. In contrast use of the Bayesian prior allows construction of non-deterministic critical regions for which the Bayesian posterior can be recovered from the frequentist confidence.

翻译：我们列举了支持首选Bayesian假设检验和置信区间而不是frequentist的论据。我们定义了对推断问题的可接受解决方案，注意到Bayesian解决方案是可接受的。我们提出了七个较弱的常识标准，用于解决推论问题，所有这些标准都未能通过这些frequentist方法，但是任何可接受的方法都能满足。我们注意到，通过对Bayesian方法进行手动处理以满足关于类型I错误率的标准的赝Bayesian方法会使它们失去贝叶斯性质而变为频率派。我们给出了五个示例，展示了Bayesian和频率派方法之间的差异；第一个需要很少的微积分，第二个在抽象上展示了这些频率派方法的问题之处，第三个用于说明信息保存，第四个用于展示相同的问题在日常统计问题中出现，第五个用于说明在某些现实生活推断问题中，Bayesian方法不需要比固定样本大小（分别是伪Bayesian）频率派假设检验更多的数据，而且倍数超过3000（分别是300），而不需要使用信息量较高的先验。为了解决不同利益相关的各方在先验上达成共识的问题，我们演示了Bayesian“让数据决定”政策的有益作用，无论是在各种条件下的结果还是在通过共识达成共同先验的动机方面都表现出色。我们指出，在一般情况下，频率派置信水平包含的Shannon信息比Bayesian后验含义更少，并提供一个例子，即使Bayesian后验包含最大可能数量的内容，也没有确定性频率派临界区间提供任何相关信息。相比之下，使用Bayesian先验可以构造非确定性的临界区间，从中可以恢复Bayesian后验的内容。

相关内容

假设检验

关注 8

假设检验是推论统计中用于检验统计假设的一种方法。而“统计假设”是可通过观察一组随机变量的模型进行检验的科学假说。一旦能估计未知参数，就会希望根据结果对未知的真正参数值做出适当的推论。统计上对参数的假设，就是对一个或多个参数的论述。而其中欲检验其正确性的为零假设（null hypothesis），零假设通常由研究者决定，反映研究者对未知参数的看法。相对于零假设的其他有关参数之论述是备择假设（alternative hypothesis），它通常反应了执行检定的研究者对参数可能数值的另一种（对立的）看法（换句话说，备择假设通常才是研究者最想知道的）。假设检验的种类包括：t检验，Z检验，卡方检验，F检验等等。

【剑桥大学博士论文】模型不确定性下的统计假设检验，198页pdf

专知会员服务

26+阅读 · 2023年2月7日

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日