Automatic speech recognition (ASR) meets more informal and free-form input data as voice user interfaces and conversational agents such as the voice assistants such as Alexa, Google Home, etc., gain popularity. Conversational speech is both the most difficult and environmentally relevant sort of data for speech recognition. In this paper, we take a linguistic perspective, and take the French language as a case study toward disambiguation of the French homophones. Our contribution aims to provide more insight into human speech transcription accuracy in conditions to reproduce those of state-of-the-art ASR systems, although in a much focused situation. We investigate a case study involving the most common errors encountered in the automatic transcription of French language.
翻译:自动语音识别(ASR)作为语音用户界面和诸如Alexa、Google Home等语音助理等谈话代理人,可以更非正式和更自由的形式输入数据(ASR)作为声音用户界面和谈话代理人(如Alexa、Google Home等)获得受欢迎程度。对口语是语音识别的最困难和环境方面最相关的数据类型。在本文中,我们从语言角度出发,将法语作为区分法语同性恋者的案例研究。我们的贡献旨在更深入地了解在复制最先进的ASR系统时的人类语音记录准确性,尽管情况重点非常突出。我们调查了一项案例研究,涉及在法语自动抄录中遇到的最常见错误。