This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention. Based on the potential outcome framework, the proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement modules (EMs) to perform noise-conditional SE. Specifically, we use the presence of noise as guidance for EM selection during training, and the noise detector selects the enhancement module according to the prediction of the presence of noise for each frame. Moreover, we derived a SE-specific average treatment effect to quantify the causal effect adequately. Experimental evidence demonstrates that CISE outperforms a non-causal mask-based SE approach in the studied settings and has better performance and efficiency than more complex SE models.
翻译:本研究通过模拟噪音存在作为一种干预,在因果推断范式内处理语音增强(SE)任务;根据潜在成果框架,拟议的基于因果推断的语音增强(CISE)将使用噪音探测器的干预性噪音言语中清洁和吵闹的框隔开来,并将这两套框分给两个基于面具的增强模块(EMs),以进行有噪音的SE。具体地说,我们在培训期间使用噪音作为选择EM的指南,噪音探测器根据对每个框架是否有噪音的预测选择增强模块。此外,我们得出了一种特定的平均处理效果,以充分量化因果关系。实验证据表明,CISE在研究环境中优于非因果遮蔽的SEE方法,其性能和效率优于更为复杂的SE模型。