Multimodal sentiment analysis aims to recognize people's attitudes from multiple communication channels such as verbal content (i.e., text), voice, and facial expressions. It has become a vibrant and important research topic in natural language processing. Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels. However, current multimodal models with strong performance are often deep-learning-based techniques and work like black boxes. It is not clear how models utilize multimodal information for sentiment predictions. Despite recent advances in techniques for enhancing the explainability of machine learning models, they often target unimodal scenarios (e.g., images, sentences), and little research has been done on explaining multimodal models. In this paper, we present an interactive visual analytics system, M2Lens, to visualize and explain multimodal models for sentiment analysis. M2Lens provides explanations on intra- and inter-modal interactions at the global, subset, and local levels. Specifically, it summarizes the influence of three typical interaction types (i.e., dominance, complement, and conflict) on the model predictions. Moreover, M2Lens identifies frequent and influential multimodal features and supports the multi-faceted exploration of model behaviors from language, acoustic, and visual modalities. Through two case studies and expert interviews, we demonstrate our system can help users gain deep insights into the multimodal models for sentiment analysis.
翻译:多式情绪分析旨在承认人们在多种沟通渠道,如口头内容(即文字)、声音和面部表达方式等中的态度。它已成为自然语言处理过程中一个充满活力和重要的研究课题。许多研究侧重于不同沟通渠道之间复杂的内部和互动模式的建模;然而,目前表现优良的多式联运模式往往是深层次的学习型技术和黑盒之类工作。不清楚模式如何利用多式联运信息进行情绪预测。尽管最近在提高机器学习模式解释性的技术方面取得了进步,但它们往往针对单一模式情景(例如图像、句子),对多式联运模式的解释研究很少。此外,我们展示了一个互动的视觉分析系统M2Lens(M2Lens),以可视化和解释情绪分析的多种模式模式模式。M2Lens(M2Lens)对全球、子和当地层面的内和多种模式互动进行了解释。具体来说,它总结了三种典型互动类型(即主导、补充和冲突)对模型预测的影响。此外,M2Lens(例如图像、图像、图像分析)中常见和具有影响力的多式分析模式,并展示了我们多式分析的多式分析。通过系统、多式分析,可以支持我们多式分析的多式分析。