Multimodal reasoning, an area of artificial intelligence that aims at make inferences from multimodal signals such as vision, language and speech, has drawn more and more attention in recent years. People with different personalities may respond differently to the same situation. However, such individual personalities were ignored in the previous studies. In this work, we introduce a new Personality-aware Human-centric Multimodal Reasoning (Personality-aware HMR) task, and accordingly construct a new dataset based on The Big Bang Theory television shows, to predict the behavior of a specific person at a specific moment, given the multimodal information of its past and future moments. The Myers-Briggs Type Indicator (MBTI) was annotated and utilized in the task to represent individuals' personalities. We benchmark the task by proposing three baseline methods, two were adapted from the related tasks and one was newly proposed for our task. The experimental results demonstrate that personality can effectively improve the performance of human-centric multimodal reasoning. To further solve the lack of personality annotation in real-life scenes, we introduce an extended task called Personality-predicted HMR, and propose the corresponding methods, to predict the MBTI personality at first, and then use the predicted personality to help multimodal reasoning. The experimental results show that our method can accurately predict personality and achieves satisfactory multimodal reasoning performance without relying on personality annotations.
翻译:多模态推理是人工智能领域的一个研究方向,旨在从多模态信号(如视觉,语言和语音)中进行推理。近年来,人们意识到不同个性的人可能对同一情境做出不同反应,然而以往的研究忽略了这种个体差异。本文提出一个新的基于《生活大爆炸》电视剧的个性感知的人本多模态推理任务,并相应地构建了一个新的数据集,以预测在特定时刻给定其过去和未来时刻的多模态信息的特定个人的行为。我们使用 Myers-Briggs 类型指标 (MBTI) 来注释并利用个体的个性。我们提出了三种基准方法对任务进行评估,其中两个改编自相关任务,一个是我们专门针对该任务提出的新方法。实验结果表明,个性可以有效地提高人本多模态推理的性能。为了进一步解决现实场景中缺乏个性注释的问题,我们引入了一个扩展任务,称为个性预测的人本多模态推理,并提出了相应的方法来首先预测 MBTI 个性,然后使用预测的个性来帮助多模态推理。实验结果表明,我们的方法可以准确地预测个性,并在不依赖于个性注释的情况下实现令人满意的多模态推理性能。