In this paper, a new speech feature fusion method is proposed for speaker recognition on the basis of the cross gate parallel convolutional neural network (CG-PCNN). The Mel filter bank features (MFBFs) of different frequency resolutions can be extracted from each speech frame of a speaker's speech by several Mel filter banks, where the numbers of the triangular filters in the Mel filter banks are different. Due to the frequency resolutions of these MFBFs are different, there are some complementaries for these MFBFs. The CG-PCNN is utilized to extract the deep features from these MFBFs, which applies a cross gate mechanism to capture the complementaries for improving the performance of the speaker recognition system. Then, the fusion feature can be obtained by concatenating these deep features for speaker recognition. The experimental results show that the speaker recognition system with the proposed speech feature fusion method is effective, and marginally outperforms the existing state-of-the-art systems.
翻译:在本文中,根据交叉门平行神经神经网络(CG-PCNNN),提出了新的语音特征聚合法,供发言者识别。不同频度分辨率的梅尔过滤银行特征(MFBFs)可以从若干梅尔过滤银行(Mel过滤银行)发言的每个语音框中提取,因为梅尔过滤银行中的三角过滤器数量不同。由于这些MFBs的频率分辨率不同,这些MFFFs有一些补充。CG-PCNN用来从这些MFFFs中提取深度特征。CG-PCN用一个跨门机制捕捉改进语音识别系统性能的辅助功能。然后,将这些深度特征混为发言者识别,就可以取得聚合特征。实验结果显示,与拟议的语音特征融合法相比,语音识别系统是有效的,并且略微超出现有状态技术系统。