Sentiment analysis is proven to be very useful tool in many applications regarding social media. This has led to a great surge of research in this field. Hence, in this paper, we compile the baselines for such research. In this paper, we explore three different deep-learning based architectures for multimodal sentiment classification, each improving upon the previous. Further, we evaluate these architectures with multiple datasets with fixed train/test partition. We also discuss some major issues, frequently ignored in multimodal sentiment analysis research, e.g., role of speaker-exclusive models, importance of different modalities, and generalizability. This framework illustrates the different facets of analysis to be considered while performing multimodal sentiment analysis and, hence, serves as a new benchmark for future research in this emerging field. We draw a comparison among the methods using empirical data, obtained from the experiments. In the future, we plan to focus on extracting semantics from visual features, cross-modal features and fusion.
翻译:在社会媒体的许多应用中,感官分析被证明是非常有用的工具,这导致这一领域的研究激增。因此,在本文件中,我们汇编了这类研究的基线。在本文件中,我们探索了三种不同的基于深学习的多式情绪分类结构,每个结构都比以往有所改进。此外,我们用具有固定火车/测试分隔的多个数据集对这些结构进行评估。我们还讨论一些在多式联运情绪分析研究中经常被忽视的重大问题,例如,发言者独家模式的作用、不同模式的重要性和可概括性。这个框架说明了在进行多式联运情绪分析时需要考虑的分析的不同方面,因此成为这个新兴领域未来研究的新基准。我们用实验获得的经验数据对这些方法进行比较。今后,我们计划侧重于从视觉特征、跨式特征和融合中提取语义。