Low and ultra-low-bitrate neural speech coding achieves unprecedented coding gain by generating speech signals from compact speech features. This paper introduces additional coding efficiency in neural speech coding by reducing the temporal redundancy existing in the frame-level feature sequence via a recurrent neural predictor. The prediction can achieve a low-entropy residual representation, which we discriminatively code based on their contribution to the signal reconstruction. The harmonization of feature prediction and discriminative coding results in a dynamic bit allocation algorithm that spends more bits on unpredictable but rare events. As a result, we develop a scalable, lightweight, low-latency, and low-bitrate neural speech coding system. We demonstrate the advantage of the proposed methods using the LPCNet as a neural vocoder. While the proposed method guarantees causality in its prediction, the subjective tests and feature space analysis show that our model achieves superior coding efficiency compared to LPCNet and Lyra V2 in the very low bitrates.
翻译:低位和超低位神经语言编码通过生成来自紧凑语言特征的语音信号,实现了前所未有的编码收益。 本文引入了神经语言编码的额外编码效率,通过经常性神经预测器减少框架级特征序列中存在的时间冗余。 预测可以实现低湿度残留代表制, 我们根据其对信号重建的贡献, 对其进行了区别对待的编码。 地貌预测和歧视性编码的协调统一导致一种动态的位数分配算法, 将更多的位数花在不可预测但罕见的事件上。 因此, 我们开发了一个可缩放、 轻量、 低纬度和低位线性神经语音编码系统。 我们展示了将 LPCNet 用作神经动力编码器的拟议方法的优势。 虽然拟议方法保证了其预测中的因果关系, 主观测试和特征空间分析表明,我们的模型实现了高于低位比特的LPCNet 和 Lyra V2 的编码效率。