The issue of fairness arises when the automatic speech recognition (ASR) systems do not perform equally well for all subgroups of the population. In any fairness measurement studies for ASR, the open questions of how to control the nuisance factors, how to handle unobserved heterogeneity across speakers, and how to trace the source of any word error rate (WER) gap among different subgroups are especially important - if not appropriately accounted for, incorrect conclusions will be drawn. In this paper, we introduce mixed-effects Poisson regression to better measure and interpret any WER difference among subgroups of interest. Particularly, the presented method can effectively address the three problems raised above and is very flexible to use in practical disparity analyses. We demonstrate the validity of proposed model-based approach on both synthetic and real-world speech data.
翻译:当自动语音识别(ASR)系统不能对所有人口分组同样发挥同等作用时,就会出现公平问题;在对ASR进行的任何公平衡量研究中,关于如何控制骚扰因素、如何处理发言者之间未观察到的异质、如何追踪不同分组之间任何字差错率(WER)的源头等开放问题特别重要,如果没有适当计算的话,将得出不正确的结论;在本文件中,我们引入混合效应Poisson回归,以更好地衡量和解释利益分组之间的任何WER差异;特别是,所提出的方法可以有效解决上述三个问题,并且非常灵活地用于实际差异分析;我们展示了在合成和现实世界语音数据方面拟议基于模型的方法的有效性。