Inter-rater reliability (IRR) has been the prevalent quality and precision measure in ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. In this work, we outline how to transform the selection procedures into a binary classification framework and develop a quantile approximation which connects a measurement model for the ratings with the binary classification framework. The quantile approximation allows us to estimate the probability of correctly selecting the best applicants and assess error probabilities when evaluating the quality of selection procedures using ratings from multiple raters. We draw connections between the inter-rater reliability and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the quantile approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures.
翻译:跨船可靠性(IRR)是多个评级者评级的普遍质量和精确度衡量,然而,基于多个评级者评级的申请人甄选程序通常产生二元结果;最后结果未在IRR中加以考虑,而侧重于对个别主题或对象的评级;在这项工作中,我们概述了如何将甄选程序转化为二元分类框架,并开发了将评级衡量模式与二元分类框架联系起来的四分点近似值;四分点近似值使我们能够估计正确选择最佳申请人的概率,并在使用多个评级者评级评估甄选程序的质量时评估误差概率;我们将跨船可靠性与二分分分分分分标准联系起来,表明双分分类指标完全取决于IRR系数和选定申请人的比例;我们在模拟研究中评估四分点近率值的性能,并在比较多个赠款同行审查甄选程序的可靠性时将其应用为例。