Speech Emotion Recognition (SER) is becoming a key role in global business today to improve service efficiency, like call center services. Recent SERs were based on a deep learning approach. However, the efficiency of deep learning depends on the number of layers, i.e., the deeper layers, the higher efficiency. On the other hand, the deeper layers are causes of a vanishing gradient problem, a low learning rate, and high time-consuming. Therefore, this paper proposed a redesign of existing local feature learning block (LFLB). The new design is called a deep residual local feature learning block (DeepResLFLB). DeepResLFLB consists of three cascade blocks: LFLB, residual local feature learning block (ResLFLB), and multilayer perceptron (MLP). LFLB is built for learning local correlations along with extracting hierarchical correlations; DeepResLFLB can take advantage of repeatedly learning to explain more detail in deeper layers using residual learning for solving vanishing gradient and reducing overfitting; and MLP is adopted to find the relationship of learning and discover probability for predicted speech emotions and gender types. Based on two available published datasets: EMODB and RAVDESS, the proposed DeepResLFLB can significantly improve performance when evaluated by standard metrics: accuracy, precision, recall, and F1-score.
翻译:在当今全球商业中,情感言语认知(SER)正在成为提高服务效率(比如呼叫中心服务)的关键角色。最近的SER(SER)正在成为全球商务中提高服务效率(比如呼叫中心服务)的关键角色。最近的SER(SER)是建立在深层学习方法基础上的。然而,深层学习的效率取决于层数,即深层,即更深层,更高的效率。另一方面,深层层是渐渐消失的梯度问题、低学习率和高耗时的原因。因此,本文件建议重新设计现有的本地地物学习区块(LFLB),新设计称为“深残余本地地物学习区块 ” (DEepResLLLB) 。深层ResLLLLB由三个级块块块组成:LFLB、剩余本地地物学习区块学习区块(ReslLB) 和多层/ceptron(MLP) 。LFB是用来学习与分层相关关系的原因。深层LFLB(LB) 和深层数据库(根据两种可评估的精确性数据,可以大幅改进:EVDB-DF-LS-LS的精确性评估:EDF-LS)1号。