Multimodal sentiment analysis is a very actively growing field of research. A promising area of opportunity in this field is to improve the multimodal fusion mechanism. We present a novel feature fusion strategy that proceeds in a hierarchical fashion, first fusing the modalities two in two and only then fusing all three modalities. On multimodal sentiment analysis of individual utterances, our strategy outperforms conventional concatenation of features by 1%, which amounts to 5% reduction in error rate. On utterance-level multimodal sentiment analysis of multi-utterance video clips, for which current state-of-the-art techniques incorporate contextual information from other utterances of the same clip, our hierarchical fusion gives up to 2.4% (almost 10% error rate reduction) over currently used concatenation. The implementation of our method is publicly available in the form of open-source code.
翻译:多式情绪分析是一个非常积极增长的研究领域。这个领域的一个大有希望的机会领域是改进多式联运聚合机制。我们提出了一个新颖的组合战略,它以等级方式进行,首先将模式二分为二,然后将所有三种模式都冻结起来。关于个别言论的多式联运情绪分析,我们的战略比通常的特征组合高出1%,这相当于差错率下降5%。关于多式视频剪辑的发声多式情绪分析,目前最先进的技术包含了同一剪辑其他语句的背景信息,我们的等级组合给目前使用的组合带来2.4%(差错率减少近10%),我们方法的实施以开放源代码的形式公开提供。