Speaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult because emotions alter the voice characteristics of a person; thus, the acoustic features differ from those used to train models in a neutral environment. Therefore, speaker recognition models trained on neutral speech fail to correctly identify speakers under emotional stress. Although considerable advancements in speaker identification have been made using convolutional neural networks (CNN), CNNs cannot exploit the spatial association between low-level features. Inspired by the recent introduction of capsule networks (CapsNets), which are based on deep learning to overcome the inadequacy of CNNs in preserving the pose relationship between low-level features with their pooling technique, this study investigates the performance of using CapsNets in identifying speakers from emotional speech recordings. A CapsNet-based speaker identification model is proposed and evaluated using three distinct speech databases, i.e., the Emirati Speech Database, SUSAS Dataset, and RAVDESS (open-access). The proposed model is also compared to baseline systems. Experimental results demonstrate that the novel proposed CapsNet model trains faster and provides better results over current state-of-the-art schemes. The effect of the routing algorithm on speaker identification performance was also studied by varying the number of iterations, both with and without a decoder network.
翻译:发言人识别系统被广泛用于各种应用中,通过声音识别某人;然而,由于语音信号的高度差异性,这是一项具有挑战性的任务。处理情绪变化非常困难,因为情感变化会改变一个人的语音特征;因此,声学特征不同于在中立环境中培训模型所用的声音特征;因此,通过中性语言培训的发言者识别模式无法正确识别情绪紧张的发言者。虽然使用超动性神经网络(CNN),有线电视新闻网无法利用低级别功能之间的空间联系。最近引入的胶囊网络(CapsNets)激发了这一挑战性任务。基于深度学习的胶囊网络(CapsNets)克服了CNNs在维护低级别特征与集中技术之间的面貌关系方面的不足,因此,本项研究调查了使用CapsNet识别情绪性演讲者在情感压力下的表现。基于CapsNet的语音识别模式是利用三个不同的语音数据库(e.e),Amirati Session 数据库、SASAS Dataset 和REVDESS (可自由访问) 的拟议模型与基线系统比较。实验性结果还用新式的模型研究了新式网络的模型和数字。