Accomplishments in the field of artificial intelligence are utilized in the advancement of computing and making of intelligent machines for facilitating mankind and improving user experience. Emotions are rudimentary for people, affecting thinking and ordinary exercises like correspondence, learning and direction. Speech emotion recognition is domain of interest in this regard and in this work, we propose a novel mel spectrogram learning approach in which our model uses the datapoints to learn emotions from the given wav form voice notes in the popular CREMA-D dataset. Our model uses log mel-spectrogram as feature with number of mels = 64. It took less training time compared to other approaches used to address the problem of emotion speech recognition.
翻译:人工智能领域的成就被用于推动计算和制作智能机器,为人类提供便利,改善用户经验。情感对人们来说是初级的,影响人们的思维和普通练习,如通信、学习和方向等。语音情感识别是这方面感兴趣的领域。在这项工作中,我们建议采用一种新的多边光谱学习方法,在模型中使用数据点从CREMA-D广受欢迎的数据集中的给定 wav 格式语音笔记中学习情感。我们的模型使用Mel-spectrotrogram日志作为Mels = 64. 与用来解决情绪语音识别问题的其他方法相比,它花费较少的培训时间。