Traditionally, Quality of Experience (QoE) for a communication system is evaluated through a subjective test. The most common test method for speech QoE is the Absolute Category Rating (ACR), in which participants listen to a set of stimuli, processed by the underlying test conditions, and rate their perceived quality for each stimulus on a specific scale. The Comparison Category Rating (CCR) is another standard approach in which participants listen to both reference and processed stimuli and rate their quality compared to the other one. The CCR method is particularly suitable for systems that improve the quality of input speech. This paper evaluates an adaptation of the CCR test procedure for assessing speech quality in the crowdsourcing set-up. The CCR method was introduced in the ITU-T Rec. P.800 for laboratory-based experiments. We adapted the test for the crowdsourcing approach following the guidelines from ITU-T Rec. P.800 and P.808. We show that the results of the CCR procedure via crowdsourcing are highly reproducible. We also compared the CCR test results with widely used ACR test procedures obtained in the laboratory and crowdsourcing. Our results show that the CCR procedure in crowdsourcing is a reliable and valid test method.
翻译:传统上,通信系统的经验质量(QoE)是通过主观测试来评估通信系统的经验质量(QoE),最常用的语音QoE测试方法是绝对类别评分(ACR),其中参与者听取一套按基本测试条件处理的刺激因素,并按具体规模评分每种刺激措施的预期质量。比较类别评分(CCCR)是另一种标准方法,其中参与者既听取参考意见,也处理过刺激因素,并与另一个系统相比评定质量。CCCR方法特别适合提高投入演讲质量的系统。本文评价了CCCR测试程序在众包设置中评估语言质量的调整情况。CCR方法被引入了IT-T Rec.P.800用于实验室实验。我们根据IT Rec.P.800和P.808的准则调整了众包方法的测试结果。我们显示,通过众包获取的CCR程序的结果是高度可复制的。我们还将CRCR测试结果与在实验室和众包中广泛使用的ACR测试程序进行比较。我们在实验室和众包中采用的一种可靠的检验方法。