We address speaker comparison by listening in a game-like environment, hypothesized to make the task more motivating for naive listeners. We present the same 30 trials selected with the help of an x-vector speaker recognition system from VoxCeleb to a total of 150 crowdworkers recruited through Amazon's Mechanical Turk. They are divided into cohorts of 50, each using one of three alternative interface designs: (i) a traditional (nongamified) design; (ii) a gamified design with feedback on decisions, along with points, game level indications, and possibility for interface customization; (iii) another gamified design with an additional constraint of maximum of 5 'lives' consumed by wrong answers. We analyze the impact of these interface designs to listener error rates (both misses and false alarms), probability calibration, time of quitting, along with survey questionnaire. The results indicate improved performance from (i) to (ii) and (iii), particularly in terms of balancing the two types of detection errors.
翻译:我们通过在类似游戏的环境中监听来比较演讲者,假设是为了使任务更能激励天真的听众。我们用VoxCeleb的X-Vactor扬声器识别系统从VoxCeleb向通过亚马逊的机械土耳其招募的总共150名人群工人提供了同样的30次试验。他们分为50人组,每组使用三种替代界面设计中的一组:(一) 传统的(未加修改的)设计;(二) 配有决定反馈的拼凑设计,加上点、游戏级别指示和接口定制的可能性;(三) 另一种加装设计,附加限制,即最多5“生命”由错误回答消耗。我们分析了这些界面设计对听众错误率(误差和误差警报)、概率校准、戒断时间的影响,以及调查问题单。结果显示,从(一)到(二)和(三)的性能有所改善,特别是在平衡两种类型的探测错误方面。