A common occurrence in many disciplines is the need to assign a set of items into categories or classes with known labels. This is often done by one or more expert raters, or sometimes by an automated process. If these assignments, or 'ratings', are difficult to do, a common tactic is to repeat them by different raters, or even by the same rater multiple times on different occasions. We present an R package, rater, available on CRAN, that implements Bayesian versions of several statistical models that allow analysis of repeated categorical rating data. Inference is possible for the true underlying (latent) class of each item, as well as the accuracy of each rater. The models are based on, and include, the Dawid-Skene model, and we implemented them using the Stan probabilistic programming language. We illustrate usage of rater through a few examples. We also discuss in detail the techniques of marginalisation and conditioning, which are necessary for these models but also apply more generally to other models implemented in Stan.
翻译:许多学科中常见的一种常见现象是需要将一组物品分到类别或类别,标明已知标签,通常由一个或多个专家评分员进行,有时则通过自动程序进行。如果这些分配或“评分”很难做到,通常的策略是在不同场合由不同的评分员重复,甚至同一评分员多次重复。我们在CRAN上提供了一套R包、评分器,在Bayesian版本的数种统计模型中应用了可以分析重复的绝对评分数据。每种评分的真正(相对)等级以及每个评分员的准确性都可以推断。这些模型以Dawid-Skene模型为基础,并且包括了Dawid-Skene模型,我们用Stan 概率性编程语言实施了这些模型。我们通过几个例子来说明评级员的使用情况。我们还详细讨论了这些模型所需要的边缘化和调节技术,但也更一般地适用于在Stan实施的其他模型。