文字回声取消 (Textual Echo Cancellation)

In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo from overlapping speech recordings. Such a system can largely improve speech recognition performance and user experience for intelligent devices such as smart speakers, as the user can talk to the device while the device is still playing the TTS signal responding to the previous query. We implement this system by using a novel sequence-to-sequence model with multi-source attention that takes both the microphone mixture signal and source text of the TTS playback as inputs, and predicts the enhanced audio. Experiments show that the textual information of the TTS playback is critical to enhancement performance. Besides, the text sequence is much smaller in size compared with the raw acoustic signal of the TTS playback, and can be immediately transmitted to the device or ASR server even before the playback is synthesized. Therefore, our proposed approach effectively reduces Internet communication and latency compared with alternative approaches such as acoustic echo cancellation (AEC).

翻译：在本文中,我们提出“文字回声取消”(TEC)——一个从重复的语音录音中取消文本到语音的回声的框架。这样的系统可以在很大程度上改善语音识别性能和智能设备(如智能扬声器)的用户经验,因为用户可以在设备仍在播放 TTS 信号时与设备交谈,该设备对上一个查询作出反应。我们通过使用具有多源关注的新颖的序列到序列模式来实施这个系统,该模式将麦克风混合信号和TTS回放源文本作为输入,并预测增强的音频。实验显示TTS回放的文字信息对增强性能至关重要。此外,与TTS回放的原始声学信号相比,文字序列的大小要小得多,甚至在回放之前就可立即传送到设备或 ASR 服务器。因此,我们提出的方法有效地减少了互联网通信和拖线,而替代方法如声回取消(AEC) 。

相关内容

语音合成

关注 0

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

微软《神经语音合成》综述论文，63页pdf530篇文献

专知会员服务

29+阅读 · 2021年7月3日

【KDD2020】解决基于图神经网络的会话推荐中的信息损失

专知会员服务

31+阅读 · 2020年10月29日

NLP必读经典文献100篇

专知会员服务

123+阅读 · 2020年9月8日

【KDD 2020】基于互信息最大化的多知识图谱语义融合

专知会员服务

39+阅读 · 2020年9月7日