Statistically significant results are more rewarded than insignificant ones, so researchers have the incentive to pursue statistical significance. Such p-hacking reduces the informativeness of hypothesis tests by making significant results much more common than they are supposed to be in the absence of true significance. To address this problem, we construct critical values of test statistics such that, if these values are used to determine significance, and if researchers optimally respond to these new significance standards, then significant results occur with the desired frequency. Such incentive-compatible critical values allow for p-hacking so they are larger than classical critical values. Using evidence from the social and medical sciences, we find that the incentive-compatible critical value for any test and any significance level is the classical critical value for the same test with approximately one fifth of the significance level -- a form of Bonferroni correction. For instance, for a z-test with a significance level of 5%, the incentive-compatible critical value is 2.31 instead of 1.65 if the test is one-sided and 2.57 instead of 1.96 if the test is two-sided.
翻译:统计上的重要结果比无关紧要的结果更有回报, 因此研究人员有动力追求统计意义。 这样的隐蔽性降低了假说测试的信息性, 其显著结果比在缺乏真正意义的情况下被认为更为常见。 为了解决这一问题, 我们构建了测试统计数据的关键值, 如果这些值被用于确定意义, 如果研究人员对这些新的意义标准做出最佳反应, 那么就会以预期的频率产生重大结果。 这种与激励兼容的关键值允许其被隐藏, 使其大于古典关键值。 使用社会和医学学的证据, 我们发现任何测试和任何重要水平的激励匹配关键值是同一测试的经典关键值, 其值约为重要程度的五分之一 -- -- 一种Bonferroni修正形式。 例如, 对于具有重要意义的5%水平的 z- 测试, 如果测试是片面的, 激励兼容关键值是2. 31, 而不是1.65, 如果测试是片面的, 2.57 而不是1.96 。