We study speech enhancement using deep learning (DL) for virtual meetings on cellular devices, where transmitted speech has background noise and transmission loss that affects speech quality. Since the Deep Noise Suppression (DNS) Challenge dataset does not contain practical disturbance, we collect a transmitted DNS (t-DNS) dataset using Zoom Meetings over T-Mobile network. We select two baseline models: Demucs and FullSubNet. The Demucs is an end-to-end model that takes time-domain inputs and outputs time-domain denoised speech, and the FullSubNet takes time-frequency-domain inputs and outputs the energy ratio of the target speech in the inputs. The goal of this project is to enhance the speech transmitted over the cellular networks using deep learning models.
翻译:由于深噪音抑制(DNS)挑战数据集不包含实际干扰,我们利用T-Mobile网络的缩放会议收集了传输的DNS(t-DNS)数据集。我们选择了两个基线模型:Demucs和FullSubNet。Demucs是一个端到端模型,需要时间-域输入和输出时间-域分隔的语音,而FullSubNet则需要输入的时间-频率-域域输入和输出投入中目标演讲的能量比。该项目的目标是利用深层学习模型加强在移动电话网络上传播的语音。