Affective computing is very important in the relationship between man and machine. In this paper, a system for speech emotion recognition (SER) based on speech signal is proposed, which uses new techniques in different stages of processing. The system consists of three stages: feature extraction, feature selection, and finally feature classification. In the first stage, a complex set of long-term statistics features is extracted from both the speech signal and the glottal-waveform signal using a combination of new and diverse features such as prosodic, spectral, and spectro-temporal features. One of the challenges of the SER systems is to distinguish correlated emotions. These features are good discriminators for speech emotions and increase the SER's ability to recognize similar and different emotions. This feature vector with a large number of dimensions naturally has redundancy. In the second stage, using classical feature selection techniques as well as a new quantum-inspired technique to reduce the feature vector dimensionality, the number of feature vector dimensions is reduced. In the third stage, the optimized feature vector is classified by a weighted deep sparse extreme learning machine (ELM) classifier. The classifier performs classification in three steps: sparse random feature learning, orthogonal random projection using the singular value decomposition (SVD) technique, and discriminative classification in the last step using the generalized Tikhonov regularization technique. Also, many existing emotional datasets suffer from the problem of data imbalanced distribution, which in turn increases the classification error and decreases system performance. In this paper, a new weighting method has also been proposed to deal with class imbalance, which is more efficient than existing weighting methods. The proposed method is evaluated on three standard emotional databases.
翻译:视觉计算在人与机器的关系中非常重要。 在本文中, 提议了一个基于语音信号的语音情绪识别系统( SER), 该系统在不同的处理阶段使用新的技术。 该系统由三个阶段组成: 特征提取、 特征选择和最后特征分类。 在第一阶段, 一组复杂的长期统计特征来自语音信号和 glotta- 波形信号, 使用多种新的和多种特征的组合, 如Prosodic、 光谱和光谱- 时空特性。 SER 系统的一个挑战就是区分相关情感。 这些特征是语言情绪的好区分器, 并增加了SER识别类似和不同情感的能力。 这三阶段中, 使用经典特征选择选择技术, 以及新的量控技术来降低特性矢量的维度, 提议的特性矢量尺寸减少。 在第三阶段, 最优化的特性矢量矢量矢量的矢量的矢量也由一个加权的深度学习机器( ENLM) 来区分相关的情感。 这些特征特征是表达情绪的偏向性偏向性偏移器, 并增加 Servialalalalal rolegration rolegration rolexal deal daldealdealdealdealde 。 在三步骤中, 数据分类中, roal disl disal disal disal dislation disl disal