Multi-modal aspect-based sentiment classification (MABSC) is an emerging classification task that aims to classify the sentiment of a given target such as a mentioned entity in data with different modalities. In typical multi-modal data with text and image, previous approaches do not make full use of the fine-grained semantics of the image, especially in conjunction with the semantics of the text and do not fully consider modeling the relationship between fine-grained image information and target, which leads to insufficient use of image and inadequate to identify fine-grained aspects and opinions. To tackle these limitations, we propose a new framework SeqCSG including a method to construct sequential cross-modal semantic graphs and an encoder-decoder model. Specifically, we extract fine-grained information from the original image, image caption, and scene graph, and regard them as elements of the cross-modal semantic graph as well as tokens from texts. The cross-modal semantic graph is represented as a sequence with a multi-modal visible matrix indicating relationships between elements. In order to effectively utilize the cross-modal semantic graph, we propose an encoder-decoder method with a target prompt template. Experimental results show that our approach outperforms existing methods and achieves the state-of-the-art on two standard datasets MABSC. Further analysis demonstrates the effectiveness of each component and our model can implicitly learn the correlation between the target and fine-grained information of the image.
翻译:多式侧面情绪分类(MABSC)是一个新兴的分类任务,旨在对特定目标(如不同模式数据中提及的实体)的情绪进行分类。在带有文本和图像的典型多式数据中,以往的方法没有充分利用图像的精细语义,尤其是与文字的语义结合,也没有充分考虑微量图像信息和目标之间的关系模型,这导致对图像的利用不足,也不足以识别细度和观点。为了应对这些限制,我们提出了一个新的SeqCSG框架,包括构建顺序交叉模式语义图和编码器解码模型模式的方法。具体地说,我们从原始图像、图像说明和场景图中提取精细拼写语义的语义性信息,并将它们视为跨模式语义图像图表和文本的象征。跨模式图像图表可以被描述成一个序列,多模式可见矩阵显示各元素之间的关系。为了有效地利用跨式图像、图像说明和场标本分析方法,我们当前模模模模模模版的模型和模版图中的两个模型结果。