Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on a single type of distortion, such as speech denoising or dereverberation. However, speech signals can be degraded by several different distortions simultaneously in the real world. It is thus important to extend speech restoration models to deal with multiple distortions. In this paper, we introduce VoiceFixer, a unified framework for high-fidelity speech restoration. VoiceFixer restores speech from multiple distortions (e.g., noise, reverberation, and clipping) and can expand degraded speech (e.g., noisy speech) with a low bandwidth to 44.1 kHz full-bandwidth high-fidelity speech. We design VoiceFixer based on (1) an analysis stage that predicts intermediate-level features from the degraded speech, and (2) a synthesis stage that generates waveform using a neural vocoder. Both objective and subjective evaluations show that VoiceFixer is effective on severely degraded speech, such as real-world historical speech recordings. Samples of VoiceFixer are available at https://haoheliu.github.io/voicefixer.
翻译:语音恢复旨在消除语音信号的扭曲。 先前的方法主要侧重于单一类型的扭曲, 如语音淡化或畸变。 但是, 语音信号会同时被现实世界中的几种不同的扭曲而退化。 因此, 扩展语音恢复模式以应对多重扭曲非常重要 。 在本文中, 我们引入了“ 语音Fixer ”, 这是高不洁语音恢复的统一框架 。 语音Fixer 恢复了来自多种扭曲( 如噪音、 反响和剪裁剪) 的语音, 并且可以将低带宽的退化言( 例如, 噪音言语) 扩大到44.1 kHz 全带宽度高不洁言。 我们设计语音修复器时, 依据 (1) 分析阶段, 预测退化言词的中间层面特征, (2) 合成阶段, 使用神经电动器生成波变形 。 两种客观和主观评估都显示, 语音Fixer 对严重退化的言论( 如真实世界的历史语音录音记录) 有效 。 语音Fixer 样本可在 https://heheheheheelu.github. /fixer 。