This paper describes a practical dual-process speech enhancement system that adapts environment-sensitive frame-online beamforming (front-end) with help from environment-free block-online source separation (back-end). To use minimum variance distortionless response (MVDR) beamforming, one may train a deep neural network (DNN) that estimates time-frequency masks used for computing the covariance matrices of sources (speech and noise). Backpropagation-based run-time adaptation of the DNN was proposed for dealing with the mismatched training-test conditions. Instead, one may try to directly estimate the source covariance matrices with a state-of-the-art blind source separation method called fast multichannel non-negative matrix factorization (FastMNMF). In practice, however, neither the DNN nor the FastMNMF can be updated in a frame-online manner due to its computationally-expensive iterative nature. Our DNN-free system leverages the posteriors of the latest source spectrograms given by block-online FastMNMF to derive the current source covariance matrices for frame-online beamforming. The evaluation shows that our frame-online system can quickly respond to scene changes caused by interfering speaker movements and outperformed an existing block-online system with DNN-based beamforming by 5.0 points in terms of the word error rate.
翻译:本文描述一个实用的双进程语音增强系统,该系统在无环境限制的成像区块线源分离(后端)的帮助下,调整环境敏感框架-线上线面线(前端)成型(前端),以适应无环境限制的区块-线上源源分离(后端)的帮助。为了使用最小差异无偏差反应(MVDR)成形(MastMMMMM)成形(MustMMMM)成型)的最小神经网络(DNNN),估计用于计算源(声音和噪音)的共变异矩阵的时间-频率掩码。建议对DNNN进行后向式调整,以应对不匹配的培训测试条件。相反,人们可以尝试直接估算源源变量共变异矩阵,即快速多道非负偏差(FastMMMF)化(FMF) 。但在实践中, DNNNNMF和F MMF都无法以框架方式更新,因为其计算扩展的迭差错变。