Consumer speech recognition systems do not work as well for many people with speech diferences, such as stuttering, relative to the rest of the general population. However, what is not clear is the degree to which these systems do not work, how they can be improved, or how much people want to use them. In this paper, we frst address these questions using results from a 61-person survey from people who stutter and fnd participants want to use speech recognition but are frequently cut of, misunderstood, or speech predictions do not represent intent. In a second study, where 91 people who stutter recorded voice assistant commands and dictation, we quantify how dysfuencies impede performance in a consumer-grade speech recognition system. Through three technical investigations, we demonstrate how many common errors can be prevented, resulting in a system that cuts utterances of 79.1% less often and improves word error rate from 25.4% to 9.9%.
翻译:消费者言语识别系统对许多有言语偏差的人,例如相对于其他一般人群而言,如口吃或言语偏差的人来说效果不佳。然而,尚不清楚的是这些系统失灵的程度、如何加以改进,或人们希望如何使用这些系统。 在本文中,我们利用来自口吃和口吃参与者希望使用言语识别但经常被割断、误解或言语预测不代表意图的人的61人调查结果来解决这些问题。 在第二项研究中,91人记录了语音助理指令和口述,我们量化了这些系统的缺陷如何妨碍消费者级别言语识别系统的运作。我们通过三次技术调查,展示了可以防止多少常见错误,导致将言语偏差从79.1%减少到9.9%,并将字词误率从25.4%提高到9.9%。