Speech distortions are a long-standing problem that degrades the performance of supervisely trained speech processing models. It is high time that we enhance the robustness of speech processing models to obtain good performance when encountering speech distortions while not hurting the original performance on clean speech. In this work, we propose to improve the robustness of speech processing models by domain adversarial training (DAT). We conducted experiments based on the SUPERB framework on five different speech processing tasks. In case we do not always have knowledge of the distortion types for speech data, we analyzed the binary-domain and multi-domain settings, where the former treats all distorted speech as one domain, and the latter views different distortions as different domains. In contrast to supervised training methods, we obtained promising results in target domains where speech data is distorted with different distortions including new unseen distortions introduced during testing.
翻译:语音扭曲是一个长期存在的问题,它降低了经过监督培训的语音处理模式的性能,现在是时候了,我们应该加强语音处理模式的稳健性,以便在遇到言语扭曲时取得良好的表现,同时不伤害最初清洁言语的性能。在这项工作中,我们建议通过域对称培训(DAT)来提高语音处理模式的稳健性。我们根据SUPERB框架对五个不同的语音处理任务进行了实验。如果我们并不总是了解语音数据的扭曲类型,我们就会分析二元域和多域设置,前者将所有扭曲的言词都视为一个领域,而后者则视不同的扭曲行为为不同的领域。与监督的培训方法相反,我们在目标领域取得了有希望的结果,在目标领域,语言数据被扭曲了不同的扭曲,包括测试期间引入的新的看不见的扭曲。