While using a speaker verification (SV) based system in a commercial application, it is important that customers have an inclusive experience irrespective of their gender, age, or ethnicity. In this paper, we analyze the impact of gender and age on SV and find that for a desired common False Acceptance Rate (FAR) across different gender and age groups, the False Rejection Rate (FRR) is different for different gender and age groups. To optimize FRR for all users for a desired FAR, we propose a context (e.g. gender, age) adaptive thresholding framework for SV. The context can be available as prior information for many practical applications. We also propose a concatenated gender/age detection model to algorithmically derive the context in absence of such prior information. We experimentally show that our context-adaptive thresholding method is effective in building a more efficient inclusive SV system. Specifically, we show that we can reduce FRR for specific gender for a desired FAR on the voxceleb1 test set by using gender-specific thresholds. Similar analysis on OGI kids' speech corpus shows that by using an age-specific threshold, we can significantly reduce FRR for certain age groups for desired FAR.
翻译:在商业应用中使用以语言为基础的校验系统(SV)的同时,客户必须具有兼容性的经验,而不论其性别、年龄或族裔如何。在本文件中,我们分析性别和年龄对SV的影响,发现对于不同性别和年龄群体所希望的通用虚假接受率(FAR),对不同的性别和年龄群体来说,假拒绝率是不同的。为了对理想的FAR优化所有用户的FR,我们建议了SV的适应性门槛框架。背景可以作为许多实际应用的事先信息提供。我们还提议了一种混合的性别/年龄检测模型,以便在没有这种先前信息的情况下从逻辑上得出背景。我们实验性地表明,我们的背景适应性阈值方法在建立更有效的包容性SV系统方面是有效的。具体地说,我们通过使用特定的年龄阈值,可以大幅降低特定性别的FRR,以达到理想的FAR值,用于使用针对性别的VCeleb1测试。对OGI儿童语音资料的类似分析表明,通过使用特定年龄阈值,我们可以大幅降低某些年龄组的FRRR。