This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS). First, we develop a blind CBF optimization algorithm that requires no prior information on the sources or the room acoustics, by extending a conventional joint DR and SS method. For making the optimization computationally tractable, we incorporate two techniques into the approach: the Source-Wise Factorization (SW-Fact) of a CBF and the Independent Vector Extraction (IVE). To further improve the performance, we develop a method that integrates a neural network(NN) based source power spectra estimation with CBF optimization by an inverse-Gamma prior. Experiments using noisy reverberant mixtures reveal that our proposed method with both blind and NN-guided scenarios greatly outperforms the conventional state-of-the-art NN-supported mask-based CBF in terms of the improvement in automatic speech recognition and signal distortion reduction performance.
翻译:本文提出了优化可联合进行拆卸(DN)、剥离(DR)和源分离(SS)的革命光束场(CBF)的优化方法。 首先,我们开发了盲光的CBF优化算法,不需要事先提供关于源或室声学的信息,方法是推广常规的DR和SS联合方法。为使优化在计算上可移动,我们将两种技术纳入这一方法:CBF和独立矢量提取(VIV)的源-Wise化(SW-Fact)。为了进一步改善性能,我们开发了一种方法,将基于神经网络(NNN)的源光电光谱估计与CBFP的优化相结合,并事先进行反伽马反光-优化。 使用噪音反动混合物的实验表明,我们提议的盲光和NNN的假设情景方法在改进自动语音识别和信号扭曲减少性能方面大大优于常规状态的NFBFFF。