可视化能减轻二分层思维吗?视觉表现对悬崖效应的影响 (Can visualization alleviate dichotomous thinking? Effects of visual representations on the cliff effect)

Common reporting styles for statistical results in scientific articles, such as p-values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the p-value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlapping, scientists tend to claim significant differences while often disregarding the magnitudes and absolute differences in the effect sizes. This type of reasoning has been shown to be potentially harmful to science. Techniques relying on the visual estimation of the strength of evidence have been recommended to reduce such dichotomous interpretations but their effectiveness has also been challenged. We ran two experiments on researchers with expertise in statistical analysis to compare several alternative representations of confidence intervals and used Bayesian multilevel models to estimate the effects of the representation styles on differences in researchers' subjective confidence in the results. We also asked the respondents' opinions and preferences in representation styles. Our results suggest that adding visual information to classic CI representation can decrease the tendency towards dichotomous interpretations - measured as the `cliff effect': the sudden drop in confidence around p-value 0.05 - compared with classic CI visualization and textual representation of the CI with p-values. All data and analyses are publicly available at https://github.com/helske/statvis.

翻译：据报告,科学文章,如P-价值和信任间隔(CI)的统计结果的通用报告风格,容易发生分解解释的情况,特别是在无效假设意义测试框架方面。例如,当p-价值足够小,或研究药物和安慰剂的平均值的影响的指数不重叠时,科学家往往声称存在重大差异,而往往无视影响大小的大小和绝对差异。这种推理已经表明可能对科学有害。建议对证据强度的直观估计进行技术解释,以减少这种分解解释,但效力也受到挑战。我们对具有统计分析专门知识的研究人员进行了两次试验,比较了几种替代的信任间隔,并使用了巴伊西亚多层次模型来估计代表性对研究人员主观信任度差异的影响。我们还询问了受访者对代表性方式的看法和偏好。我们的结果表明,在典型的CI代表中增加视觉信息可以减少分解解释的趋势,这是以“千差效应”衡量的,但效力也受到挑战。我们对具有统计分析专长的研究人员进行了两次试验,比较了几种替代的信任间隔间隔间隔期,并使用了巴伊西亚多层次模型。我们问了Scial CIalal 的视觉估价,所有Clial 和Cal-vial vial vial vial vial vial dal dal dal disgual disal vial vial vial vial vials 。