Recognizing emotions from text in multimodal architectures has yielded promising results, surpassing video and audio modalities under certain circumstances. However, the method by which multimodal data is collected can be significant for recognizing emotional features in language. In this paper, we address the influence data collection methodology has on two multimodal emotion recognition datasets, the IEMOCAP dataset and the OMG-Emotion Behavior dataset, by analyzing textual dataset compositions and emotion recognition accuracy. Experiments with the full IEMOCAP dataset indicate that the composition negatively influences generalization performance when compared to the OMG-Emotion Behavior dataset. We conclude by discussing the impact this may have on HRI experiments.
翻译:由于认识到多式联运结构中文本的情绪,在某些情况下已经取得了令人乐观的结果,超过了视频和音频模式,然而,收集多式联运数据的方法对于识别语言中的情感特征意义重大,在本文件中,我们讨论了数据收集方法对两个多式情感识别数据集(IEMOCAP数据集和OMG-Emove Behavior数据集)的影响,通过分析文本数据集构成和情感识别准确性,对两套多式情感识别数据集(IEMOCAP数据集)和OMG-Emotion Behavior数据集(OMC-Emotion Behavior数据集)进行了分析,结果显示,与OMG-Emotion Bevior数据集相比,该数据集的构成对一般化绩效产生了负面影响,我们通过讨论该数据集可能对HRI实验产生的影响而得出结论。