Real-time single-channel speech separation aims to unmix an audio stream captured from a single microphone that contains multiple people talking at once, environmental noise, and reverberation into multiple de-reverberated and noise-free speech tracks, each track containing only one talker. While large state-of-the-art DNNs can achieve excellent separation from anechoic mixtures of speech, the main challenge is to create compact and causal models that can separate reverberant mixtures at inference time. In this paper, we explore low-complexity, resource-efficient, causal DNN architectures for real-time separation of two or more simultaneous speakers. A cascade of three neural network modules are trained to sequentially perform noise-suppression, separation, and de-reverberation. For comparison, a larger end-to-end model is trained to output two anechoic speech signals directly from noisy reverberant speech mixtures. We propose an efficient single-decoder architecture with subtractive separation for real-time recursive speech separation for two or more speakers. Evaluation on real monophonic recordings of speech mixtures, according to speech separation measures like SI-SDR, perceptual measures like DNS-MOS, and a novel proposed channel separation metric, show that these compact causal models can separate speech mixtures with low latency, and perform on par with large offline state-of-the-art models like SepFormer.
翻译:实时单声道隔音, 目的是将单个麦克风中包含多个人同时交谈、 环境噪音和回响的音流混为一流, 每个音轨只包含一个音轨, 每个音轨只包含一个音轨。 虽然大型最先进的 DNN 能够实现极佳的分解, 与厌食式的言语混合体分离, 但主要的挑战在于创建紧凑和因果模式, 在引文时间可以分离反动混合体。 在本文中, 我们探索一个低兼容性、 资源效率、 因果关系 DNN 结构, 用于将两个或两个以上同时发言者实时分离。 一个由三个神经网络模块组成的连锁系统, 被训练为连续进行噪声压、 分离和反响调调调。 相比之下, 一个更大的端对端对端模式, 直接从噪音反动性言调混合体发出两种动音调信号。 我们建议一个高效的单调结构, 以减式分解方式将两个或两个以上同时发言者进行实时重复的语音分离。 三个神经网络模块模块模块的连动组合组合组合组合组合组合,, 以真实的S- real- decal- decal- decal- missal- disal roal rocal rocal mocal rocal rocal mocal modeal mocal modeal mode modeal motional motion mode modeal modeal modeal modeal modeal modeal modeal modeal model mode mode model la model la la la del demodeal demodeal demodeal demodeal deal deal demodeal deal deal deal deal deal demodeal deal deal deal demodeal demodeal demodeal demodel demodeal deal deal deal deal deal demodal deal deal deal deal deal deal deal deal demodal demodal demodal deal deal demodal demomental demo</s>