In this work, we present CleanUNet, a causal speech denoising model on the raw waveform. The proposed model is based on an encoder-decoder architecture combined with several self-attention blocks to refine its bottleneck representations, which is crucial to obtain good results. The model is optimized through a set of losses defined over both waveform and multi-resolution spectrograms. The proposed method outperforms the state-of-the-art models in terms of denoised speech quality from various objective and subjective evaluation metrics. We release our code and models at https://github.com/nvidia/cleanunet.
翻译:在这项工作中,我们提出CleanUNet,这是原始波形上的因果言分解模型,拟议的模型以编码器-解密器结构为基础,加上若干自我注意块来完善其瓶颈表示方式,这对于取得良好结果至关重要,该模型通过波形和多分辨率光谱图界定的一系列损失加以优化,拟议方法在各种客观和主观评价指标中,在非名言质量方面优于最新模型。我们在https://github.com/nvidia/cleanunet上公布了我们的代码和模型。