可缩放和高效的神经语音编码 (Scalable and Efficient Neural Speech Coding)

This work presents a scalable and efficient neural waveform codec (NWC) for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as its feedforward routine. The proposed CNN autoencoder also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process. We achieve efficiency by introducing compact model architectures to our fully convolutional network model, such as gated residual networks and depthwise separable convolution. Furthermore, the proposed models are with a scalable architecture, cross-module residual learning (CMRL), to cover a wide range of bitrates. To this end, we employ the residual coding concept to concatenate multiple NWC autoencoding modules, where an NWC module performs residual coding to restore any reconstruction loss that its preceding modules have created. CMRL can scale down to cover lower bitrates as well, for which it employs linear predictive coding (LPC) module as its first autoencoder. We redefine LPC's quantization as a trainable module to enhance the bit allocation tradeoff between LPC and its following NWC modules. Compared to the other autoregressive decoder-based neural speech coders, our decoder has significantly smaller architecture, e.g., with only 0.12 million parameters, more than 100 times smaller than a WaveNet decoder. Compared to the LPCNet-based speech codec, which leverages the speech production model to reduce the network complexity in low bitrates, ours can scale up to higher bitrates to achieve transparent performance. Our lightweight neural speech coding model achieves comparable subjective scores against AMR-WB at the low bitrate range and provides transparent coding quality at 32 kbps.

翻译：这项工作为语音压缩提供了一个可缩放和高效的神经波调码( NWC) 。我们将语音编码问题设计成自动编码任务, 使 convolual 神经网络( CNN) 进行编码和解码作为其向前的常规。拟议的CNN 自动编码器还将二次计算和加密编码作为可训练模块, 这样在优化过程中处理编码工艺和比特节控制。我们通过在我们完全连动的网络模型模型中引入压缩模型结构来提高效率, 比如门端剩余网络和深度可分解的连线连接。此外, 拟议的模型中包含一个可缩放结构, 跨式神经神经网络( CMRNNNNNNNN) 的余代码学习, 覆盖广泛的位数。为此, 我们使用剩余编码模型概念来将多个 NWC 自动编码模块连接起来, 这样, NWC 只能对以前模块中的任何重建损失进行余下调。 CMRL 可以缩放到更低的比特节, 并且, 使用直线式的预言比代码网络的预言比代的比代码, 版的网络的网络的比代的更低的变码, 将我们更高级的变换成更高级的版本。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

近期必读的五篇顶会 ACL 2020【图神经网络 (GNN) 】相关论文

专知会员服务

105+阅读 · 2020年6月9日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日