Multimodal sentiment analysis and depression estimation are two important research topics that aim to predict human mental states using multimodal data. Previous research has focused on developing effective fusion strategies for exchanging and integrating mind-related information from different modalities. Some MLP-based techniques have recently achieved considerable success in a variety of computer vision tasks. Inspired by this, we explore multimodal approaches with a feature-mixing perspective in this study. To this end, we introduce CubeMLP, a multimodal feature processing framework based entirely on MLP. CubeMLP consists of three independent MLP units, each of which has two affine transformations. CubeMLP accepts all relevant modality features as input and mixes them across three axes. After extracting the characteristics using CubeMLP, the mixed multimodal features are flattened for task predictions. Our experiments are conducted on sentiment analysis datasets: CMU-MOSI and CMU-MOSEI, and depression estimation dataset: AVEC2019. The results show that CubeMLP can achieve state-of-the-art performance with a much lower computing cost.
翻译:多式情绪分析和抑郁估计是两个重要研究课题,目的是利用多式联运数据预测人类心理状态。以前的研究侧重于制定有效的融合战略,以交流和整合不同模式的与心灵有关的信息。一些基于MLP的技术最近在各种计算机愿景任务中取得了相当大的成功。受这一研究的启发,我们探索了具有特征混合观点的多式联运方法。为此,我们引入了完全以MLP为基础的多式特征处理框架CubeMLP。CubeMLP由三个独立的MLP单元组成,每个单元有两个直线转换。CubeMLP接受所有相关模式特征作为投入,将其混合到三个轴上。在利用CubeMLP提取特征后,混合的多式联运特征被固定用于任务预测。我们进行了情感分析数据集实验:CMU-MOSI和CMU-MOSEI,以及抑郁估计数据集:AVEC2019。结果显示CMLP能够以低得多的计算成本实现状态性表现。