Attention has become one of the most commonly used mechanisms in deep learning approaches. The attention mechanism can help the system focus more on the feature space's critical regions. For example, high amplitude regions can play an important role for Speech Emotion Recognition (SER). In this paper, we identify misalignments between the attention and the signal amplitude in the existing multi-head self-attention. To improve the attention area, we propose to use a Focus-Attention (FA) mechanism and a novel Calibration-Attention (CA) mechanism in combination with the multi-head self-attention. Through the FA mechanism, the network can detect the largest amplitude part in the segment. By employing the CA mechanism, the network can modulate the information flow by assigning different weights to each attention head and improve the utilization of surrounding contexts. To evaluate the proposed method, experiments are performed with the IEMOCAP and RAVDESS datasets. Experimental results show that the proposed framework significantly outperforms the state-of-the-art approaches on both datasets.
翻译:关注机制可以帮助系统更多地关注地物空间的关键区域。例如,高振幅区域可以对情感言语识别(SER)起到重要作用。在本文件中,我们发现现有多头自我意识中的注意力和信号振幅之间的不匹配。为了改进关注领域,我们提议使用焦点-注意(FA)机制和与多头自留(CA)机制相结合的新校准-注意(CA)机制。通过FA机制,网络可以探测到该部分中最大的振幅部分。通过使用CA机制,网络可以通过对每个关注对象给予不同的权重和改进周围环境的利用来调节信息流动。为了评估拟议的方法,与IDOCAP和READES数据集进行实验。实验结果表明,拟议的框架大大超越了两个数据集的状态方法。