Deep learning algorithm are increasingly used for speech enhancement (SE). In supervised methods, global and local information is required for accurate spectral mapping. A key restriction is often poor capture of key contextual information. To leverage long-term for target speakers and compensate distortions of cleaned speech, this paper adopts a sequence-to-sequence (S2S) mapping structure and proposes a novel monaural speech enhancement system, consisting of a Feature Extraction Block (FEB), a Compensation Enhancement Block (ComEB) and a Mask Block (MB). In the FEB a U-net block is used to extract abstract features using complex-valued spectra with one path to suppress the background noise in the magnitude domain using masking methods and the MB takes magnitude features from the FEBand compensates the lost complex-domain features produced from ComEB to restore the final cleaned speech. Experiments are conducted on the Librispeech dataset and results show that the proposed model obtains better performance than recent models in terms of ESTOI and PESQ scores.
翻译:深度学习算法越来越多地用于语言增强。在监督的方法中,准确的光谱绘图需要全球和地方信息。关键限制往往是关键背景信息的捕获不力。为了对目标演讲人长期发挥杠杆作用并补偿清洁言论的扭曲,本文件采用了一个从顺序到顺序的绘图结构,并提议了一个新型的月经语音强化系统,由地貌提取块(FEB)、补偿增强区块(ComEB)和遮罩区块(MB)组成。在FEB中,一个U-net区块用于利用复杂估值光谱提取抽象特征,其中一条路径是使用掩罩方法来抑制星域的背景噪音,而MB则从FEB和M中获取规模特征,以补偿从COMEB产生的丢失的复杂语言功能,以恢复最后的清洁言论。对Librispeech数据集进行了实验,结果显示,拟议的模型在ESTOI和ESQ分数方面比最近的模型取得更好的性能。