Spatial filters can exploit deep-learning-based speech enhancement models to increase their reliability in scenarios with multiple speech sources scenarios. To further improve speech quality, it is common to perform postfiltering on the estimated target speech obtained with spatial filtering. In this work, Minimum Variance Distortionless Response (MVDR) is employed to provide the interference estimation, along with the estimation of the target speech, to be later used for postfiltering. This improves the enhancement performance over a single-input baseline in a far more significant way than by increasing the model's complexity. Results suggest that less computing resources are required for postfiltering when provided with both target and interference signals, which is a step forward in developing an online speech enhancement system for multi-speech scenarios.
翻译:暂无翻译