In noisy environments, speech can be hard to understand for humans. Spoken dialog systems can help to enhance the intelligibility of their output, either by modifying the speech synthesis (e.g., imitate Lombard speech) or by optimizing the language generation. We here focus on the second type of approach, by which an intended message is realized with words that are more intelligible in a specific noisy environment. By conducting a speech perception experiment, we created a dataset of 900 paraphrases in babble noise, perceived by native English speakers with normal hearing. We find that careful selection of paraphrases can improve intelligibility by 33% at SNR -5 dB. Our analysis of the data shows that the intelligibility differences between paraphrases are mainly driven by noise-robust acoustic cues. Furthermore, we propose an intelligibility-aware paraphrase ranking model, which outperforms baseline models with a relative improvement of 31.37% at SNR -5 dB.
翻译:在吵闹的环境中,语言对人类来说很难理解。 口语对话系统可以通过修改语言合成(例如模仿伦巴德语言)或通过优化语言生成来帮助提高语言输出的智能性。 我们在这里集中关注第二种方式,即用在特定吵闹环境中更易理解的词来表达预期的信息。 通过进行语音认知实验,我们创建了一套900个词句的数据集,这些词句由低语噪音组成,当地英语语言者在正常听觉中看到。我们发现,谨慎选择副词句可以在SRN-5 dB中提高33%的智能性。 我们对数据的分析表明,对参数的智能性差异主要是由噪音-紫外线声提示驱动的。 此外,我们提议了一个智能-觉知觉副词排序模型,该模型比SNR-5 dB的基线模型高出31.37%。