Convolutional neural network (CNN) modules are widely being used to build high-end speech enhancement neural models. However, the feature extraction power of vanilla CNN modules has been limited by the dimensionality constraint of the convolution kernels that are integrated - thereby, they have limitations to adequately model the noise context information at the feature extraction stage. To this end, adding recurrency factor into the feature extracting CNN layers, we introduce a robust context-aware feature extraction strategy for single-channel speech enhancement. As shown, adding recurrency results in capturing the local statistics of noise attributes at the extracted features level and thus, the suggested model is effective in differentiating speech cues even at very noisy conditions. When evaluated against enhancement models using vanilla CNN modules, in unseen noise conditions, the suggested model with recurrency in the feature extraction layers has produced a segmental SNR (SSNR) gain of up to 1.5 dB, an improvement of 0.4 in subjective quality in the Mean Opinion Score scale, while the parameters to be optimized are reduced by 25%.
翻译:革命神经网络模块(CNN)正被广泛用于建设高端语音增强神经模型,然而,香草CNN模块的特征提取能力由于整合的卷变内核的维度限制而受到限制----因此,在特征提取阶段,这些模块在充分模拟噪音背景信息方面存在局限性。为此,在提取CNN层的特征中添加了再通量系数,我们为单一频道语音增强引入了一种强有力的环境觉悟特征提取战略。正如所显示的那样,在提取的功能一级获取本地噪音属性统计数据时添加了再通量结果,从而在非常吵闹的条件下也有效地区分了语音提示。 在用香草CNN模块对增强模型进行评估时,在不可见的噪音条件下,建议的特征提取层再通量模型产生了可达1.5 dB的区段SNR(SSNR)增益,在平均意见评分尺度上提高了0.4的主观质量,而要优化的参数则减少了25%。