Guided Source Separation (GSS) is a popular front-end for distant automatic speech recognition (ASR) systems using spatially distributed microphones. When considering spatially distributed microphones, the choice of reference microphone may have a large influence on the quality of the output signal and the downstream ASR performance. In GSS-based speech enhancement, reference microphone selection is typically performed using the signal-to-noise ratio (SNR), which is optimal for noise reduction but may neglect differences in early-to-late-reverberant ratio (ELR) across microphones. In this paper, we propose two reference microphone selection methods for GSS-based speech enhancement that are based on the normalized $\ell_p$-norm, either using only the normalized $\ell_p$-norm or combining the normalized $\ell_p$-norm and the SNR to account for both differences in SNR and ELR across microphones. Experimental evaluation using a CHiME-8 distant ASR system shows that the proposed $\ell_p$-norm-based methods outperform the baseline method, reducing the macro-average word error rate.
翻译:引导源分离(GSS)是基于空间分布式麦克风的远场自动语音识别(ASR)系统中常用的前端处理技术。在使用空间分布式麦克风时,参考麦克风的选择对输出信号质量及下游ASR性能具有显著影响。在基于GSS的语音增强中,参考麦克风选择通常采用信噪比(SNR)作为指标,该方法虽对降噪效果最优,但可能忽略各麦克风间早期-晚期混响比(ELR)的差异。本文提出两种基于归一化ℓ_p范数的GSS语音增强参考麦克风选择方法:第一种仅使用归一化ℓ_p范数;第二种结合归一化ℓ_p范数与信噪比,以同时考量麦克风间SNR与ELR的差异。通过CHiME-8远场ASR系统的实验评估表明,所提出的基于ℓ_p范数的方法优于基线方法,有效降低了宏观平均词错误率。