In many research areas, for example motion and gesture generation, objective measures alone do not provide an accurate impression of key stimulus traits such as perceived quality or appropriateness. The gold standard is instead to evaluate these aspects through user studies, especially subjective evaluations of video stimuli. Common evaluation paradigms either present individual stimuli to be scored on Likert-type scales, or ask users to compare and rate videos in a pairwise fashion. However, the time and resources required for such evaluations scale poorly as the number of conditions to be compared increases. Building on standards used for evaluating the quality of multimedia codecs, this paper instead introduces a framework for granular rating of multiple comparable videos in parallel. This methodology essentially analyses all condition pairs at once. Our contributions are 1) a proposed framework, called HEMVIP, for parallel and granular evaluation of multiple video stimuli and 2) a validation study confirming that results obtained using the tool are in close agreement with results of prior studies using conventional multiple pairwise comparisons.
翻译:在许多研究领域,例如运动和手势生成,仅靠客观措施并不能准确反映诸如感知质量或适当性等关键刺激特征。金本位标准则通过用户研究,特别是视频刺激的主观评价来评估这些方面。共同评价模式要么提出个人刺激因素,在爱丽特型尺度上得分,要么要求用户以双向方式比较和评分视频。然而,这种评价所需的时间和资源规模因比较条件的增加而差强人意。根据评估多媒体代码质量的标准,本文采用了一个平行对多个可比视频进行颗粒评级的框架。这种方法基本上一次分析所有条件配对。我们的贡献是:1)一个拟议框架,称为HEMVIP,用于对多个视频模拟尺度进行平行和颗粒评价,2)一个确认使用该工具取得的结果的验证研究与使用常规的多对称比较进行先前研究的结果十分一致。