Personal narratives (PN) - spoken or written - are recollections of facts, people, events, and thoughts from one's own experience. Emotion recognition and sentiment analysis tasks are usually defined at the utterance or document level. However, in this work, we focus on Emotion Carriers (EC) defined as the segments (speech or text) that best explain the emotional state of the narrator ("loss of father", "made me choose"). Once extracted, such EC can provide a richer representation of the user state to improve natural language understanding and dialogue modeling. In previous work, it has been shown that EC can be identified using lexical features. However, spoken narratives should provide a richer description of the context and the users' emotional state. In this paper, we leverage word-based acoustic and textual embeddings as well as early and late fusion techniques for the detection of ECs in spoken narratives. For the acoustic word-level representations, we use Residual Neural Networks (ResNet) pretrained on separate speech emotion corpora and fine-tuned to detect EC. Experiments with different fusion and system combination strategies show that late fusion leads to significant improvements for this task.
翻译:个人叙述(PN) - 口头或书面的 - 个人叙述(PN) - - 口头或书面的 - - - 是对事实、人、事件和来自个人经验的想法的回忆。情感识别和情绪分析任务通常是在发音或文件一级界定的。然而,在这项工作中,我们侧重于情感载体(EC),定义为最能解释发音者情感状态的部分(语音或文字) (“父亲的失落”,“我自选” )。一旦摘取,这种EC可以提供更丰富的用户国代表,以改善自然语言理解和对话模式。在以往的工作中,已经表明EC可以用词汇特征识别。然而,口述叙述应更能描述背景和用户情绪状态。在本文中,我们利用基于字的声学和文字的嵌入技术以及早期和迟发调技术来探测EC。关于声学级别的表达,我们使用残余神经网络(ResNet)先于不同的语音感知识和微调测出EC。与不同的聚合和系统组合战略的实验显示,迟发式可以显著地改进EC。