Neural audio coding has shown very promising results recently in the literature to largely outperform traditional codecs but limited attention has been paid on its error resilience. Neural codecs trained considering only source coding tend to be extremely sensitive to channel noises, especially in wireless channels with high error rate. In this paper, we investigate how to elevate the error resilience of neural audio codecs for packet losses that often occur during real-time communications. We propose a feature-domain packet loss concealment algorithm (FD-PLC) for real-time neural speech coding. Specifically, we introduce a self-attention-based module on the received latent features to recover lost frames in the feature domain before the decoder. A hybrid segment-level and frame-level frequency-domain discriminator is employed to guide the network to focus on both the generative quality of lost frames and the continuity with neighbouring frames. Experimental results on several error patterns show that the proposed scheme can achieve better robustness compared with the corresponding error-free and error-resilient baselines. We also show that feature-domain concealment is superior to waveform-domain counterpart as post-processing.
翻译:在文献中,神经声调编码最近显示了非常有希望的结果,基本上超过了传统编码,但对其错误应变能力的关注有限。只考虑源码的神经编码经过培训,对频道噪音敏感度很高,特别是在无线频道中。在本文中,我们调查如何提高神经音调编码对实时通信中经常发生的包损失的错误应变能力。我们建议对实时神经语音编码采用特性主页包隐藏损失算法(FD-PLC),但具体地说,我们对接收的潜伏特性采用基于自我注意的模块,以便在解码器之前恢复功能域中丢失的框。使用混合段级和框架级频率区分器来指导网络关注丢失框架的基因质量和与周边框架的连续性。若干错误模式的实验结果显示,与相应的无误和错隔断基线相比,拟议方案可以实现更好的稳健性。我们还表明,在解码域域域中,功能隐藏优于后处理时对口对口。