This paper proposes a dual-stage, low complexity, and reconfigurable technique to enhance the speech contaminated by various types of noise sources. Driven by input data and audio contents, the proposed dual-stage speech enhancement approach performs a coarse and fine processing in the first-stage and second-stage, respectively. In this paper, we demonstrate that the proposed speech enhancement solution significantly enhances the metrics of 3-fold QUality Evaluation of Speech in Telecommunication (3QUEST) consisting of speech mean-opinion-score (SMOS) and noise MOS (NMOS) for near-field and far-field applications. Moreover, the proposed speech enhancement approach greatly improves both the signal-to-noise ratio (SNR) and subjective listening experience. For comparisons, the traditional speech enhancement methods reduce the SMOS although they increase NMOS and SNR. In addition, the proposed speech enhancement scheme can be easily adopted in both capture path and speech render path for speech communication and conferencing systems, and voice-trigger applications.
 翻译:本文建议采用双阶段、低复杂度和可调整技术,加强受各类噪音源污染的言语。在投入数据和音频内容的驱动下,拟议的双阶段增强言语方法在第一阶段和第二阶段分别采用粗糙和精细的处理方法。在本文件中,我们表明,拟议的增强言语方法大大加强了对电信(3QUEST)中言论进行3倍一致评价的衡量标准,其中包括对近场和远地应用的言论平均读数(SMOS)和噪音MOS(NMOS),此外,拟议的加强言语方法极大地改进了信号对噪音比率(SNR)和主观倾听经历。为比较起见,传统的增强言语方法虽然增加了NMOS和SNR。此外,拟议的加强言语方法可以很容易地在捕捉路径和语音转换语音通信和会议系统以及语音触发应用中采用。