In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and platform-specific processing. To this end, we propose a multi-task learning framework for VoIP-DNS that jointly optimizes noise suppression and VoIP-specific acoustics for speech enhancement. We evaluate our approach on a diverse VoIP scenarios and show that it outperforms both industry performance and state-of-the-art methods for speech enhancement on VoIP applications. Our results demonstrate the potential of models trained on DNS-2020 to be improved and tailored to different VoIP platforms using VoIP-DNS, whose findings have important applications in areas such as speech recognition, voice assistants, and telecommunication.
翻译:在本文中,我们提出了一套方法,用于微调在2020年“深噪音抑制挑战”方面受过培训的模型,以提高其在互联网语音协议应用方面的表现。我们的方法是使2020年“DNS”模型适应VoIP通信的具体声学特性,其中包括压缩、传输和平台特定处理造成的扭曲和人工制品。为此,我们建议为“VoIP-DNS”提供一个多任务学习框架,共同优化噪音抑制和VoIP专用声学,以加强语音。我们评估了我们对于不同的“VoIP”情景的处理方法,并表明它优于“VoIP”应用程序的行业性能和最新语音增强方法。我们的结果显示了“DNS-2020”培训模型的潜力,这些模型将加以改进,并适应不同的VoIP-DNS平台,这些平台的调查结果在语音识别、语音助理和电信等领域具有重要应用。</s>