Generative adversarial network (GAN) still exists some problems in dealing with speech enhancement (SE) task. Some GAN-based systems adopt the same structure from Pixel-to-Pixel directly without special optimization. The importance of the generator network has not been fully explored. Other related researches change the generator network but operate in the time-frequency domain, which ignores the phase mismatch problem. In order to solve these problems, a deep complex convolution recurrent GAN (DCCRGAN) structure is proposed in this paper. The complex module builds the correlation between magnitude and phase of the waveform and has been proved to be effective. The proposed structure is trained in an end-to-end way. Different LSTM layers are used in the generator network to sufficiently explore the speech enhancement performance of DCCRGAN. The experimental results confirm that the proposed DCCRGAN outperforms the state-of-the-art GAN-based SE systems.
翻译:生成对抗性网络(GAN)在处理语音增强(SE)任务方面仍然存在一些问题。一些基于GAN的系统直接采用像素到像素的同一结构,而没有特别优化。尚未充分探讨发电机网络的重要性。其他相关研究改变发电机网络,但在时间频域内运作,忽略了阶段不匹配问题。为了解决这些问题,本文件提议了一个复杂的复杂重复的GAN(DCCRGAN)结构。复杂的模块在波形的规模和阶段之间建立了相关性,并证明是有效的。拟议的结构以端到端方式培训。发电机网络使用不同的LSTM层来充分探索DCCRGAN的语音增强性能。实验结果证实,拟议的DCRCGAN(DCRCGAN)超越了以GAN(SE)为基础的最先进的SE系统。