Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech output. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer model performs better on clean speech inputs than a compressor model trained only on clean speech.
翻译:最近开发了基于离散神经自动自动读取器的音频编码器,显示它们为类似的高质量语音输出提供了高得多的压缩水平。 但是,这些模型与语音内容紧密结合,在噪音条件下产生意外输出。 根据VQ-VAE自动编码器和WaveRNNN 解调器,我们开发了压缩机-增生器编码器和配套解调器,并显示它们在吵闹的条件下运作良好。 我们还观察到,压缩机强化器模型在清洁语音输入方面比仅受过清洁语音培训的压缩机模型表现得更好。