With the advancements in deep learning approaches, the performance of speech enhancing systems in the presence of background noise have shown significant improvements. However, improving the system's robustness against reverberation is still a work in progress, as reverberation tends to cause loss of formant structure due to smearing effects in time and frequency. A wide range of deep learning-based systems either enhance the magnitude response and reuse the distorted phase or enhance complex spectrogram using a complex time-frequency mask. Though these approaches have demonstrated satisfactory performance, they do not directly address the lost formant structure caused by reverberation. We believe that retrieving the formant structure can help improve the efficiency of existing systems. In this study, we propose SkipConvGAN - an extension of our prior work SkipConvNet. The proposed system's generator network tries to estimate an efficient complex time-frequency mask, while the discriminator network aids in driving the generator to restore the lost formant structure. We evaluate the performance of our proposed system on simulated and real recordings of reverberant speech from the single-channel task of the REVERB challenge corpus. The proposed system shows a consistent improvement across multiple room configurations over other deep learning-based generative adversarial frameworks.
翻译:随着深层学习方法的进展,在背景噪音面前强化语音系统的性能显示出了显著的改进;然而,在出现背景噪音的情况下,提高系统抗反动的强度仍然是一项正在进行的工作,因为回响往往会因时间和频率上的涂抹效应而导致形成结构的丧失。广泛的深层学习系统要么加强规模反应,再利用扭曲的阶段,要么利用复杂的时频遮罩加强复杂的光谱系统。虽然这些方法表现令人满意,但并不直接处理反动导致的失态结构。我们认为,检索系统形成结构有助于提高现有系统的效率。在本研究中,我们建议Sppe GonGAN -- -- 扩大我们先前的工作Spping ConvNet。拟议的系统发电机网络试图估计一个高效的复杂时频遮罩,同时使用歧视网络帮助驱动发电机恢复丢失的成形结构。我们评估了我们拟议的系统在模拟和真实记录RveyerB挑战机群的单声波室反动讲话的性能。拟议的系统显示,其他深层次的对抗性框架不断改进。