Inter-rater reliability (IRR), which is a prerequisite of high-quality ratings and assessments, may be affected by contextual variables such as the rater's or ratee's gender, major, or experience. Identification of such heterogeneity sources in IRR is important for implementation of policies with the potential to decrease measurement error and to increase IRR by focusing on the most relevant subgroups. In this study, we propose a flexible approach for assessing IRR in cases of heterogeneity due to covariates by directly modeling differences in variance components. We use Bayes factors to select the best performing model, and we suggest using Bayesian model-averaging as an alternative approach for obtaining IRR and variance component estimates, allowing us to account for model uncertainty. We use inclusion Bayes factors considering the whole model space to provide evidence for or against differences in variance components due to covariates. The proposed method is compared with other Bayesian and frequentist approaches in a simulation study, and we demonstrate its superiority in some situations. Finally, we provide real data examples from grant proposal peer-review, demonstrating the usefulness of this method and its flexibility in the generalization of more complex designs.
翻译:作为高质量评级和评估的一个先决条件的跨行业间可靠性(IRR),可能受到相关变数的影响,例如,评分者或评分者的性别、主要或经验等,因此可能受到相关变数的影响。找出IRR中的这种异质性源对于执行有可能减少测量误差和通过注重最相关的分组而增加IRR的政策十分重要。在本研究中,我们建议采用灵活的方法,通过直接模拟差异组成部分的差异来评估因共差而导致的IRR异性。我们使用贝叶斯因素选择最佳的模型,我们建议使用贝叶斯模式的常态化作为获取IRR和差异组成部分估计数的替代方法,以便让我们考虑到模型不确定性。我们使用考虑整个模型的贝叶斯因素来提供证据,说明由于共差性造成的差异部分的差异。在模拟研究中,拟议方法与其他贝叶斯人和经常使用的方法相比较,我们在某些情形中展示其优越性。最后,我们从赠款提案的同行审查中提供真实的数据实例,以证明这种方法的实用性及其在较复杂的设计中的灵活性。