The increasingly stringent requirement on quality-of-experience in 5G/B5G communication systems has led to the emerging neural speech enhancement techniques, which however have been developed in isolation from the existing expert-rule based models of speech pronunciation and distortion, such as the classic Linear Predictive Coding (LPC) speech model because it is difficult to integrate the models with auto-differentiable machine learning frameworks. In this paper, to improve the efficiency of neural speech enhancement, we introduce an LPC-based speech enhancement (LPCSE) architecture, which leverages the strong inductive biases in the LPC speech model in conjunction with the expressive power of neural networks. Differentiable end-to-end learning is achieved in LPCSE via two novel blocks: a block that utilizes the expert rules to reduce the computational overhead when integrating the LPC speech model into neural networks, and a block that ensures the stability of the model and avoids exploding gradients in end-to-end training by mapping the Linear prediction coefficients to the filter poles. The experimental results show that LPCSE successfully restores the formants of the speeches distorted by transmission loss, and outperforms two existing neural speech enhancement methods of comparable neural network sizes in terms of the Perceptual evaluation of speech quality (PESQ) and Short-Time Objective Intelligibility (STOI) on the LJ Speech corpus.
翻译:5G/B5G通信系统对高质量经验的日益严格要求导致神经语音强化技术的出现,但这种技术的开发与现有基于专家规则的语音发音和扭曲模式模式(如经典的Linear 预测编码(LPC)语言模型(LPC)语言模型,因为很难将这些模型与自动差异的机器学习框架整合在一起,因此很难将这些模型与自动差异型机器学习框架整合起来。在本文件中,为了提高神经语音增强的效率,我们引入了一个基于LPC的语音增强(LPCSE)架构,利用LPC语音模型的强烈感应偏向性偏向性,与神经网络的显眼力相结合。 LPCSE通过两个新颖的块块,成功恢复了LPCSE的语音端对端到端学习。 在将LPC语言模型纳入神经网络时,使用专家规则来减少计算成本管理模式的间接成本。 在终端培训中,我们引入了基于过滤杆的线性预测系数,从而避免了LPCSE(LERCE)成功地恢复了现有语音语音质量网络的形态,通过传输目的的两种形式,并恢复了现有变换式发言质量目标。