Recent years have seen a surge in the number of available frameworks for speech enhancement (SE) and recognition. Whether model-based or constructed via deep learning, these frameworks often rely in isolation on either time-domain signals or time-frequency (TF) representations of speech data. In this study, we investigate the advantages of each set of approaches by separately examining their impact on speech intelligibility and quality. Furthermore, we combine the fragmented benefits of time-domain and TF speech representations by introducing two new cross-domain SE frameworks. A quantitative comparative analysis against recent model-based and deep learning SE approaches is performed to illustrate the merit of the proposed frameworks.
翻译:近年来,加强言语和承认的现有框架数量激增,无论是以模式为基础还是通过深层学习构建的,这些框架往往孤立地依赖时间-域信号或语音数据的时间-频率(TF)表达方式。在本研究报告中,我们通过分别审查对语言智能和质量的影响来调查每一套方法的优点。此外,我们将时间-域和TF演讲表述方式的零散效益结合起来,引入两个新的跨域 SE框架。对最近的基于模式的和深入学习的SE方法进行了定量比较分析,以说明拟议框架的优点。