The notion of experiment precision quantifies the variance of user ratings in a subjective experiment. Although there exist measures that assess subjective experiment precision, there are no systematic analyses of these measures available in the literature. To the best of our knowledge, there is also no systematic framework in the Multimedia Quality Assessment field for comparing subjective experiments in terms of their precision. Therefore, the main idea of this paper is to propose a framework for comparing subjective experiments in the field of MQA based on appropriate experiment precision measures. We present three experiment precision measures and three related experiment precision comparison methods. We systematically analyse the performance of the measures and methods proposed. We do so both through a simulation study (varying user rating variance and bias) and by using data from four real-world Quality of Experience (QoE) subjective experiments. In the simulation study we focus on crowdsourcing QoE experiments, since they are known to generate ratings with higher variance and bias, when compared to traditional subjective experiment methodologies. We conclude that our proposed measures and related comparison methods properly capture experiment precision (both when tested on simulated and real-world data). One of the measures also proves capable of dealing with even significantly biased responses. We believe our experiment precision assessment framework will help compare different subjective experiment methodologies. For example, it may help decide which methodology results in more precise user ratings. This may potentially inform future standardisation activities.
翻译:实验精确度概念量化了主观实验中用户评级的差异。虽然存在评估主观实验精确度的计量标准,但文献中没有对这些计量措施进行系统分析。据我们所知,多媒体质量评估领域也没有系统框架来比较主观实验的精确度,因此,本文的主要想法是提出一个框架,根据适当的实验精确度计量标准,比较MQA领域的主观实验;我们提出三项实验精确度措施和三个相关的实验精确度比较方法。我们系统地分析拟议措施和方法的性能。我们通过模拟研究(不同用户评级差异和偏差)和使用四个现实世界经验质量(QoE)主观实验的数据这样做。在模拟研究中,我们侧重于众包QoE实验,因为众所周知,与传统的主观实验方法相比,它们会产生差异和偏差程度更高的评级。我们的结论是,我们提出的措施和相关比较方法恰当地测量了试验的精确度(在模拟数据和现实世界数据测试时都是如此)。我们提出的措施之一也是通过模拟研究(在模拟和现实世界数据时)和使用四个现实世界经验质量(QoE)的主观性实验实验实验实验实验数据,我们相信,这种实验方法的精确度评估方法的精确度也比较了我们的标准。