Multimodal affect recognition constitutes an important aspect for enhancing interpersonal relationships in human-computer interaction. However, relevant data is hard to come by and notably costly to annotate, which poses a challenging barrier to build robust multimodal affect recognition systems. Models trained on these relatively small datasets tend to overfit and the improvement gained by using complex state-of-the-art models is marginal compared to simple baselines. Meanwhile, there are many different multimodal affect recognition datasets, though each may be small. In this paper, we propose to leverage these datasets using weakly-supervised multi-task learning to improve the generalization performance on each of them. Specifically, we explore three multimodal affect recognition tasks: 1) emotion recognition; 2) sentiment analysis; and 3) sarcasm recognition. Our experimental results show that multi-tasking can benefit all these tasks, achieving an improvement up to 2.9% accuracy and 3.3% F1-score. Furthermore, our method also helps to improve the stability of model performance. In addition, our analysis suggests that weak supervision can provide a comparable contribution to strong supervision if the tasks are highly correlated.
翻译:然而,相关数据很难找到,而且对批注来说代价特别昂贵,这给建立强大的多式联运系统造成了具有挑战性的障碍。在这些相对较小的数据集上培训的模型往往过于完善,而且与简单的基线相比,使用复杂的最新模型所取得的改进是微不足道的。与此同时,有许多不同的多式联运影响识别数据集,尽管每个数据集都可能很小。在本文件中,我们提议利用这些数据集,利用微弱监督的多任务学习来改进每个数据集的普遍化绩效。具体地说,我们探讨三种多式联运影响识别任务:1)情感识别;2)情绪分析;和3)讽刺感识别。我们的实验结果表明,多任务可以惠及所有这些任务,达到2.9%的准确度和3.3%的F1-核心值。此外,我们的方法也有助于提高模型性能的稳定性。此外,我们的分析表明,如果任务高度相关,薄弱的监督可以对强有力的监督做出类似的贡献。