In recent years, ASR systems have reached remarkable performance on specific tasks for which sufficient amounts of training data are available, like e.g. LibriSpeech. However, varying acoustic and recording conditions and speaking styles and a lack of sufficient in-domain training data still pose challenges to the development of accurate models. In this work, we present our efforts for the development of ASR systems for a conversational telephone speech translation task in the medical domain for three languages (Arabic, German, Vietnamese) to support emergency room interaction between physician and patient across language barriers. We study different training schedules and data combination approaches in order to improve the system's performance, as well as analyze where limited available data is used most efficiently.
翻译:近年来,ASR系统在具备足够培训数据的具体任务(如LibriSpeech)上取得了显著成绩,然而,不同的音响和录音条件和发言风格以及缺乏足够的内部培训数据仍然对准确模型的开发构成挑战,在这项工作中,我们介绍了我们为在医疗领域为三种语文(阿拉伯文、德文、越南文)开发ASR系统以进行电话语音对话翻译工作所做的努力,以支持医生和病人之间跨越语言障碍的应急室互动,我们研究了不同的培训时间表和数据组合方法,以改进系统的运作,并分析了在哪些地方最高效地使用有限的可用数据。