We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way to represent data is in the form of graphs. Following the theory of graph signal processing, we propose to model speech signal as a cycle graph or a line graph. Such graph structure enables us to construct a graph convolution network (GCN)-based architecture that can perform an \emph{accurate} graph convolution in contrast to the approximate convolution used in standard GCNs. We evaluated the performance of our model for speech emotion recognition on the popular IEMOCAP database. Our model outperforms standard GCN and other relevant deep graph architectures indicating the effectiveness of our approach. When compared with existing speech emotion recognition methods, our model achieves state-of-the-art performance (4-class, $65.29\%$) with significantly fewer learnable parameters.
翻译:我们提出一个深图方法来解决言语情感识别的任务。一个精密、高效和可缩放的表达数据的方法是图表形式。根据图形信号处理理论,我们建议以循环图或直线图模式模拟语音信号。这种图形结构使我们能够构建一个基于图形变速网络(GCN)的架构,能够与标准GCN使用的大致变速率形成对比。我们评估了我们在广受欢迎的IEMOCAP数据库中语音识别模型的性能。我们的模型优于标准GCN和其他相关显示我们方法有效性的深图结构。与现有的语音感知方法相比,我们的模型实现了最先进的性能(4级,65.29美元 ), 其可学习参数要少得多。