It is a central dogma in science that a result of a study should be replicable. Only 90 of the 190 replications attempts were successful. We attribute a substantial part of the problem to selective inference evident in the paper, which is the practice of selecting some of the results from the many. 100 papers in the Reproducibility Project in Psychology were analyzed. It was evident that the reporting of many results is common (77.7 per paper on average). It was further found that the selection from those multiple results is not adjusted for. We propose to account for selection using the hierarchical false discovery rate (FDR) controlling procedure TreeBH of Bogomolov et al. (2020), which exploits hierarchical structures to gain power. Results that were statistically significant after adjustment were 97% of the replicable results (31 of 32). Additionally, only 1 of the 21 non-significant results after adjustment was replicated. Given the easy deployment of adjustment tools and the minor loss of power involved, we argue that addressing multiplicity is an essential missing component in experimental psychology. It should become a required component in the arsenal of replicability enhancing methodologies in the field.
翻译:一项研究的结果应可复制,这是科学的一个中心教条。在190次复制尝试中,只有90次成功。我们将问题在很大程度上归因于文件中所显示的选择性推断,即从许多结果中选择部分结果。在心理学中,对100篇论文进行了分析,对许多结果的报告很常见(平均每份文件77.7份),还发现从这些多重结果中挑选的结果没有调整。我们提议使用博戈莫洛夫等人(202020年)的级级假发现率控制程序(FDR)对选择进行核算,该程序利用等级结构获得权力。在调整后具有统计意义的结果是可复制结果的97%(32年的31份)。此外,在调整后21项无重大结果中,只有1项得到复制。由于易于使用调整工具,而且所涉权力略有损失,我们争辩说,处理多重性是实验心理学中一个基本缺失的组成部分。它应当成为加强实地可复制性方法库中的必要组成部分。