Previous works on emotion recognition in conversation (ERC) follow a two-step paradigm, which can be summarized as first producing context-independent features via fine-tuning pretrained language models (PLMs) and then analyzing contextual information and dialogue structure information among the extracted features. However, we discover that this paradigm has several limitations. Accordingly, we propose a novel paradigm, i.e., exploring contextual information and dialogue structure information in the fine-tuning step, and adapting the PLM to the ERC task in terms of input text, classification structure, and training strategy. Furthermore, we develop our model BERT-ERC according to the proposed paradigm, which improves ERC performance in three aspects, namely suggestive text, fine-grained classification module, and two-stage training. Compared to existing methods, BERT-ERC achieves substantial improvement on four datasets, indicating its effectiveness and generalization capability. Besides, we also set up the limited resources scenario and the online prediction scenario to approximate real-world scenarios. Extensive experiments demonstrate that the proposed paradigm significantly outperforms the previous one and can be adapted to various scenes.
翻译:先前关于谈话中情感识别的工作遵循一个两步模式,可以归纳为首先通过微调预先培训的语言模式(PLM)生成与背景相关的特征,然后分析在提取的特征中的背景信息和对话结构信息。然而,我们发现这一模式有几个局限性。因此,我们提出了一个新颖的范例,即在微调步骤中探索背景信息和对话结构信息,并在输入文本、分类结构和培训战略方面将PLM的任务与ERC的任务相适应。此外,我们根据拟议的模式开发了我们的BERT-ERC模型,改进了ERC在三个方面的绩效,即:建议文本、细微分类模块和两阶段培训。与现有方法相比,BERT-ERC在四个数据集上取得了实质性改进,显示了其有效性和总体化能力。此外,我们还设定了有限的资源情景和在线预测情景,以近似于真实世界情景。广泛的实验表明,拟议的模式大大优于前一种模式,可以适应各种场景。