This paper proposes an noise type classification aided attention-based neural network approach for monaural speech enhancement. The network is constructed based on a previous work by introducing a noise classification subnetwork into the structure and taking the classification embedding into the attention mechanism for guiding the network to make better feature extraction. Specifically, to make the network an end-to-end way, an audio encoder and decoder constructed by temporal convolution is used to make transformation between waveform and spectrogram. Additionally, our model is composed of two long short term memory (LSTM) based encoders, two attention mechanism, a noise classifier and a speech mask generator. Experiments show that, compared with OM-LSA and the previous work, the proposed noise classification aided attention-based approach can achieve better performance in terms of speech quality (PESQ). More promisingly, our approach has better generalization ability to unseen noise conditions.
翻译:本文建议采用噪音类型分类法,帮助关注神经网络增强音调。网络是根据先前的工作建立的,在结构中引入噪音分类子网络,并将分类纳入引导网络的注意机制,以更好地提取特征。具体地说,为使网络成为一种端到端的方式,利用时间变迁所构造的音频编码器和解码器来转换波形和光谱。此外,我们的模型包括两个基于长期的短期内存(LSTM)编码器、两个注意机制、一个噪音分类器和一个语音掩码生成器。实验显示,与OM-LSA和先前的工作相比,拟议的噪音分类有助于关注的方法可以在语音质量方面实现更好的表现(PESQ)。更有希望的是,我们的方法更能概括无声条件。