Multimodal emotion recognition in conversations (mERC) is an active research topic in natural language processing (NLP), which aims to predict human's emotional states in communications of multiple modalities, e,g., natural language and facial gestures. Innumerable implicit prejudices and preconceptions fill human language and conversations, leading to the question of whether the current data-driven mERC approaches produce a biased error. For example, such approaches may offer higher emotional scores on the utterances by females than males. In addition, the existing debias models mainly focus on gender or race, where multibias mitigation is still an unexplored task in mERC. In this work, we take the first step to solve these issues by proposing a series of approaches to mitigate five typical kinds of bias in textual utterances (i.e., gender, age, race, religion and LGBTQ+) and visual representations (i.e, gender and age), followed by a Multibias-Mitigated and sentiment Knowledge Enriched bi-modal Transformer (MMKET). Comprehensive experimental results show the effectiveness of the proposed model and prove that the debias operation has a great impact on the classification performance for mERC. We hope our study will benefit the development of bias mitigation in mERC and related emotion studies.
翻译:在自然语言处理(NLP)中,现有贬低模式主要侧重于性别或种族,而减少多重偏见仍然是MERC尚未探索的任务。 在这项工作中,我们首先采取解决这些问题的第一步,提出一系列办法,减轻文字表达(即性别、年龄、种族、宗教和男女同性恋、双性恋和变性者)和视觉表现(即性别和年龄)方面的五种典型偏见,从而证明我们拟议的模型和情感分析研究将产生巨大的效果。