One of the major parts of the voice recognition field is the choice of acoustic features which have to be robust against the variability of the speech signal, mismatched conditions, and noisy environments. Thus, different speech feature extraction techniques have been developed. In this paper, we investigate the robustness of several front-end techniques in Arabic speaker identification. We evaluate five different features in babble, factory and subway conditions at the various signal to noise ratios (SNR). The obtained results showed that two of the auditory feature i.e. gammatone frequency cepstral coefficient (GFCC) and power normalization cepstral coefficients (PNCC), unlike their combination performs substantially better than a conventional speaker features i.e. Mel-frequency cepstral coefficients (MFCC).
翻译:语音识别领域的一个主要部分是选择声学特征,这些特征必须能够抵御语音信号的变异性、不匹配的条件和吵闹的环境。因此,已经开发了不同的语音特征提取技术。在本文件中,我们调查了阿拉伯语语音识别中几种前端技术的稳健性。我们评估了各种噪音比信号(SNR)在编织、工厂和地铁条件方面的五个不同特征。获得的结果显示,两种听力特征,即伽马酮频率加速系数(GFCC)和电源常态加速系数(PNCC),与它们的组合不同,它们的组合比传统的语音特征(Mel-频率缓冲系数(MFCC)要好得多。