There remain two critical challenges that hinder the development of ERC. Firstly, there is a lack of exploration into mining deeper insights from the data itself for conversational emotion tasks. Secondly, the systems exhibit vulnerability to random modality feature missing, which is a common occurrence in realistic settings. Focusing on these two key challenges, we propose a novel framework for incomplete multimodal learning in ERC, called "Inverted Teacher-studEnt seArCH Network (ITEACH-Net)." ITEACH-Net comprises two novel components: the Emotion Context Changing Encoder (ECCE) and the Inverted Teacher-Student (ITS) framework. Specifically, leveraging the tendency for emotional states to exhibit local stability within conversational contexts, ECCE captures these patterns and further perceives their evolution over time. Recognizing the varying challenges of handling incomplete versus complete data, ITS employs a teacher-student framework to decouple the respective computations. Subsequently, through Neural Architecture Search, the student model develops enhanced computational capabilities for handling incomplete data compared to the teacher model. During testing, we design a novel evaluation method, testing the model's performance under different missing rate conditions without altering the model weights. We conduct experiments on three benchmark ERC datasets, and the results demonstrate that our ITEACH-Net outperforms existing methods in incomplete multimodal ERC. We believe ITEACH-Net can inspire relevant research on the intrinsic nature of emotions within conversation scenarios and pave a more robust route for incomplete learning techniques. Codes will be made available.
翻译:暂无翻译