Recent achievements in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data with unified deep network models. Having such models for compressing audio signals has been challenging since it requires discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that combines recurrent neural networks (RNNs) within the training strategy of variational autoencoders (VAEs) with a binary representation of the latent space. We apply a reparametrization trick for the Bernoulli distribution for the discrete representations, which allows smooth backpropagation. In addition, our approach allows the separation of the encoder and decoder, which is necessary for compression tasks. To our best knowledge, this is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.54.
翻译:近期在端到端深层学习方面的成就鼓励了探索使用统一的深网络模型处理高度结构化数据的任务。 采用这种压缩音频信号的模式一直具有挑战性,因为它要求使用不易用端到端的反向插图进行训练的离散演示。 在本文中,我们提出了一个端到端深层学习方法,在变异自动计算器(VAE)的培训战略中,将经常性神经网络与潜在空间的二进制表示法结合起来。 我们为离散显示器的 Bernoulli 配送应用了一种重新对称法,允许平滑的反向转换。 此外,我们的方法允许分离编码器和解码器,这是压缩任务所必需的。 据我们所知,这是与RNN(S)一起的单一音频压缩模型的第一个端到端学习,我们的模式实现了20.54的信号调控率(SDR)。