The objective of this work is to investigate complementary features which can aid the quintessential Mel frequency cepstral coefficients (MFCCs) in the task of closed, limited set word recognition for non-native English speakers of different mother-tongues. Unlike the MFCCs, which are derived from the spectral energy of the speech signal, the proposed frequency-centroids (FCs) encapsulate the spectral centres of the different bands of the speech spectrum, with the bands defined by the Mel filterbank. These features, in combination with the MFCCs, are observed to provide relative performance improvement in English word recognition, particularly under varied noisy conditions. A two-stage Convolution Neural Network (CNN) is used to model the features of the English words uttered with Arabic, French and Spanish accents.
翻译:这项工作的目标是调查有助于对不同母语的非母语英语人进行封闭、有限字数识别的典型梅尔频率阴部系数(MFCCs)的互补特征。与来自语音信号光谱能量的MFCCs不同的是,拟议的频率中心机器人(FCs)将语言频谱不同波段的光谱中心与Mel过滤库界定的波段封装在一起,这些特征与MFCCs一道,观察到这些特征在英语单词识别方面提供了相对的性能改进,特别是在不同的吵闹条件下。 使用了两阶段演进神经网络(CNN)来模拟阿拉伯语、法语和西班牙口音的英语词的特征。