Recently, the convolutional weighted power minimization distortionless response (WPD) beamformer was proposed, which unifies multi-channel weighted prediction error dereverberation and minimum power distortionless response beamforming. To optimize the convolutional filter, the desired speech component is modeled with a time-varying Gaussian model, which promotes the sparsity of the desired speech component in the short-time Fourier transform domain compared to the noisy microphone signals. In this paper we generalize the convolutional WPD beamformer by using an lp-norm cost function, introducing an adjustable shape parameter which enables to control the sparsity of the desired speech component. Experiments based on the REVERB challenge dataset show that the proposed method outperforms the conventional convolutional WPD beamformer in terms of objective speech quality metrics.
翻译:最近,有人提议了革命加权动力最小化无扭曲反应(WPD)光束,它统一了多通道加权预测误差和最小功率无扭曲反应波束。为了优化进化过滤器,理想的语音组件建模时长的Gaussian模型,该模型在短时间的Fourier变换域中促进所希望的语音组件与噪音麦克风信号相比的广度。在本文中,我们通过使用 lp-norm 成本函数来概括进进进进进进式 WPD光束,引入一个可调整的形状参数,能够控制想要的语音组件的广度。基于ReverB 挑战数据集的实验显示,拟议方法在客观的语音质量指标上超越常规的进式变式WPD。