As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC). PLC is a challenging task because it not only involves real-time speech synthesis, but also frequent transitions between the received audio and the synthesized concealment. We propose a hybrid neural PLC architecture where the missing speech is synthesized using a generative model conditioned using a predictive model. The resulting algorithm achieves natural concealment that surpasses the quality of existing conventional PLC algorithms and ranked second in the Interspeech 2022 PLC Challenge. We show that our solution not only works for uncompressed audio, but is also applicable to a modern speech codec.
翻译:由于深层语音增强算法最近显示的能力远远超过了压制噪音、反响和回声的传统对应能力,人们正把注意力转向包封损失问题。 PLC是一项艰巨的任务,因为它不仅涉及实时语音合成,而且涉及接收的音频和合成隐藏之间的频繁转换。我们提议了一个混合神经PLC结构,其中缺失的言语使用一种以预测模型为条件的基因模型合成。 由此产生的演算法实现了超过现有常规PLC算法质量的自然隐藏,并在Interspeech 2022 PLC挑战中排名第二。我们表明,我们的解决方案不仅适用于未压缩的音频,而且适用于现代语音代码。