This work presents a scalable and efficient neural waveform codec (NWC) for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as its feedforward routine. The proposed CNN autoencoder also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process. We achieve efficiency by introducing compact model architectures to our fully convolutional network model, such as gated residual networks and depthwise separable convolution. Furthermore, the proposed models are with a scalable architecture, cross-module residual learning (CMRL), to cover a wide range of bitrates. To this end, we employ the residual coding concept to concatenate multiple NWC autoencoding modules, where an NWC module performs residual coding to restore any reconstruction loss that its preceding modules have created. CMRL can scale down to cover lower bitrates as well, for which it employs linear predictive coding (LPC) module as its first autoencoder. We redefine LPC's quantization as a trainable module to enhance the bit allocation tradeoff between LPC and its following NWC modules. Compared to the other autoregressive decoder-based neural speech coders, our decoder has significantly smaller architecture, e.g., with only 0.12 million parameters, more than 100 times smaller than a WaveNet decoder. Compared to the LPCNet-based speech codec, which leverages the speech production model to reduce the network complexity in low bitrates, ours can scale up to higher bitrates to achieve transparent performance. Our lightweight neural speech coding model achieves comparable subjective scores against AMR-WB at the low bitrate range and provides transparent coding quality at 32 kbps.
翻译:这项工作为语音压缩提供了一个可缩放和高效的神经波调码( NWC) 。 我们将语音编码问题设计成自动编码任务, 使 convolual 神经网络( CNN) 进行编码和解码作为其向前的常规。 拟议的CNN 自动编码器还将二次计算和加密编码作为可训练模块, 这样在优化过程中处理编码工艺和比特节控制。 我们通过在我们完全连动的网络模型模型中引入压缩模型结构来提高效率, 比如门端剩余网络和深度可分解的连线连接。 此外, 拟议的模型中包含一个可缩放结构, 跨式神经神经网络( CMRNNNNNNNN) 的余代码学习, 覆盖广泛的位数。 为此, 我们使用剩余编码模型概念来将多个 NWC 自动编码模块连接起来, 这样, NWC 只能对以前模块中的任何重建损失进行余下调。 CMRL 可以缩放到更低的比特节, 并且, 使用直线式的预言比代码网络的预言比代的比代码, 版的网络的网络的比代的更低的变码, 将我们更高级的变换成更高级的版本。