We apply sequence-to-sequence model to mitigate the impact of speech recognition errors on open domain end-to-end dialog generation. We cast the task as a domain adaptation problem where ASR transcriptions and original text are in two different domains. In this paper, our proposed model includes two individual encoders for each domain data and make their hidden states similar to ensure the decoder predict the same dialog text. The method shows that the sequence-to-sequence model can learn the ASR transcriptions and original text pair having the same meaning and eliminate the speech recognition errors. Experimental results on Cornell movie dialog dataset demonstrate that the domain adaption system help the spoken dialog system generate more similar responses with the original text answers.
翻译:我们应用了顺序到顺序模型来减轻语音识别错误对开放域端到端对话框生成的影响。 我们将此任务作为一个域适应问题, 因为 ASR 转录和原始文本分属两个不同领域。 在本文中, 我们提议的模型包括了每个域数据的两个单个编码器, 并让它们隐藏的状态相似, 以确保解码器预测相同的对话框文本 。 该方法显示, 顺序到顺序模型可以学习具有相同含义的 ASR 转录和原始文本配对, 并消除语音识别错误 。 康奈尔电影对话框数据集的实验结果显示, 域调整系统帮助口语对话系统生成与原始文本答案相似的响应 。