There are over 55 different ways to construct a confidence respectively credible interval (CI) for the binomial proportion. Methods to compare them are necessary to decide which should be used in practice. The interval score has been suggested to compare prediction intervals. This score is a proper scoring rule that combines the coverage as a measure of calibration and the width as a measure of sharpness. We evaluate eleven CIs for the binomial proportion based on the expected interval score and propose a summary measure which can take into account different weighting of the underlying true proportion. Under uniform weighting, the expected interval score recommends the Wilson CI or Bayesian credible intervals with a uniform prior. If extremely low or high proportions receive more weight, the score recommends Bayesian credible intervals based on Jeffreys' prior. While more work is needed to theoretically justify the use of the interval score for the comparison of CIs, our results suggest that it constitutes a useful method to combine coverage and width in one measure. This novel approach could also be used in other applications.
翻译:对于二元制比例,有超过55种不同的方法可以建立互信互信,而对于二元制比例,则有超过55种不同的方法可以建立互信,比较的方法是决定实际中应采用哪种方法。建议了间距评分,以比较预测间隔。这个评分是一个适当的评分规则,将覆盖面作为校准的尺度,宽度作为锐度的尺度。我们根据预期的间距评分对二元比例进行了11个计算,并提出了一个可以考虑对底线真实比例不同加权的简要衡量标准。在统一加权的情况下,预期间距评分建议采用威尔逊光或巴耶西亚的可靠间隔,并采用统一的前置间隔。如果极低或高比例获得较重的评分,则建议采用基于杰弗里斯之前的巴耶斯可信的间隔。虽然在理论上需要做更多的工作,以证明使用间距比法来比较CI的比值是合理的,但我们的结果表明,这是将覆盖面和宽度合并一个尺度的有用方法。在其他应用中也可以使用这种新办法。