Speech is the fundamental means of communication between humans. The advent of AI and sophisticated speech technologies have led to the rapid proliferation of human-to-computer-based interactions, fueled primarily by Automatic Speech Recognition (ASR) systems. ASR systems normally take human speech in the form of audio and convert it into words, but for some users, it cannot decode the speech, and any output text is filled with errors that are incomprehensible to the human reader. These systems do not work equally for everyone and actually hinder the productivity of some users. In this paper, we present research that addresses ASR biases against gender, race, and the sick and disabled, while exploring studies that propose ASR debiasing techniques for mitigating these discriminations. We also discuss techniques for designing a more accessible and inclusive ASR technology. For each approach surveyed, we also provide a summary of the investigation and methods applied, the ASR systems and corpora used, and the research findings, and highlight their strengths and/or weaknesses. Finally, we propose future opportunities for Natural Language Processing researchers to explore in the next level creation of ASR technologies.
翻译:人工智能和尖端语言技术的出现导致人与计算机之间互动的迅速扩散,主要是由自动语音识别系统推动的。ASR系统通常以音频形式使用人的语言,将其转换成文字,但对于一些用户来说,它不能解码语言,任何输出文本都充满了读者无法理解的错误。这些系统对每个人来说并不平等,实际上妨碍了某些用户的生产力。在本文中,我们提出研究,涉及对性别、种族、病人和残疾人的ASR偏见,同时探讨提出减少这些歧视的ASR贬低性技术的研究。我们还讨论设计一种更方便使用和包容性的ASR技术的技术的技术。我们还为每个接受调查的方法提供了调查和应用方法的概要、使用的ASR系统和Corbora所使用的系统和研究结果,并突出其优点和/或弱点。最后,我们建议自然语言处理研究人员今后有机会在下一个层次上探索ASR技术的创造。