个性感知的人本多模态推理: 一项新任务 (Personality-aware Human-centric Multimodal Reasoning: A New Task) - 专知论文

会员服务 ·

0

多模态推理 · 多模 · 多模态 · 模态 · 注释（编程） ·

2023 年 4 月 5 日

Personality-aware Human-centric Multimodal Reasoning: A New Task

翻译：个性感知的人本多模态推理: 一项新任务

Yaochen Zhu,Xiangqing Shen,Rui Xia

Multimodal reasoning, an area of artificial intelligence that aims at make inferences from multimodal signals such as vision, language and speech, has drawn more and more attention in recent years. People with different personalities may respond differently to the same situation. However, such individual personalities were ignored in the previous studies. In this work, we introduce a new Personality-aware Human-centric Multimodal Reasoning (Personality-aware HMR) task, and accordingly construct a new dataset based on The Big Bang Theory television shows, to predict the behavior of a specific person at a specific moment, given the multimodal information of its past and future moments. The Myers-Briggs Type Indicator (MBTI) was annotated and utilized in the task to represent individuals' personalities. We benchmark the task by proposing three baseline methods, two were adapted from the related tasks and one was newly proposed for our task. The experimental results demonstrate that personality can effectively improve the performance of human-centric multimodal reasoning. To further solve the lack of personality annotation in real-life scenes, we introduce an extended task called Personality-predicted HMR, and propose the corresponding methods, to predict the MBTI personality at first, and then use the predicted personality to help multimodal reasoning. The experimental results show that our method can accurately predict personality and achieves satisfactory multimodal reasoning performance without relying on personality annotations.

翻译：多模态推理是人工智能领域的一个研究方向，旨在从多模态信号（如视觉，语言和语音）中进行推理。近年来，人们意识到不同个性的人可能对同一情境做出不同反应，然而以往的研究忽略了这种个体差异。本文提出一个新的基于《生活大爆炸》电视剧的个性感知的人本多模态推理任务，并相应地构建了一个新的数据集，以预测在特定时刻给定其过去和未来时刻的多模态信息的特定个人的行为。我们使用 Myers-Briggs 类型指标 (MBTI) 来注释并利用个体的个性。我们提出了三种基准方法对任务进行评估，其中两个改编自相关任务，一个是我们专门针对该任务提出的新方法。实验结果表明，个性可以有效地提高人本多模态推理的性能。为了进一步解决现实场景中缺乏个性注释的问题，我们引入了一个扩展任务，称为个性预测的人本多模态推理，并提出了相应的方法来首先预测 MBTI 个性，然后使用预测的个性来帮助多模态推理。实验结果表明，我们的方法可以准确地预测个性，并在不依赖于个性注释的情况下实现令人满意的多模态推理性能。

0

相关内容

多模态推理

多模态推理

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【ACM UMAP 2022 】可复现推荐系统的语义感知内容表示，148页ppt

【ACM UMAP 2022 】可复现推荐系统的语义感知内容表示，148页ppt

专知会员服务

17+阅读 · 2022年7月6日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【AAAI2022】多任务推荐中的跨任务知识提炼

【AAAI2022】多任务推荐中的跨任务知识提炼

专知会员服务

25+阅读 · 2022年2月22日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

专知会员服务

45+阅读 · 2019年11月18日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

专知

10+阅读 · 2018年6月8日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

基于认知心理学与虚拟现实的感官营销与跨通道联结研究

国家自然科学基金

0+阅读 · 2014年12月31日

幼儿汉语口语感知特点及神经机制

国家自然科学基金

0+阅读 · 2014年12月31日

动态面孔语音情绪的整合加工及神经生理机制

国家自然科学基金

0+阅读 · 2013年12月31日

颜色-运动特征的绑定与视觉意识的关系

国家自然科学基金

0+阅读 · 2013年12月31日

面向人类视觉感知的高分辨率遥感图像检索研究

国家自然科学基金

0+阅读 · 2012年12月31日

儿童从3D媒体中能学得更快更多吗？——三维媒体到现实世界的迁移学习机制

国家自然科学基金

0+阅读 · 2012年12月31日

低维冷原子系统中的Wilson系数和量子临界性

国家自然科学基金

0+阅读 · 2012年12月31日

情绪状态下工作记忆及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

视觉识别中类别信息早期加工的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

不同层级句法结构中语义加工的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning

Arxiv

0+阅读 · 2023年5月25日

Explore More Guidance: A Task-aware Instruction Network for Sign Language Translation Enhanced with Data Augmentation

Arxiv

0+阅读 · 2023年5月25日

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

Complex Logical Reasoning over Knowledge Graphs using Large Language Models

Arxiv

0+阅读 · 2023年5月24日

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Arxiv

0+阅读 · 2023年5月24日

Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies

Arxiv

0+阅读 · 2023年5月24日

EDIS: Entity-Driven Image Search over Multimodal Web Content

Arxiv

0+阅读 · 2023年5月23日

Benchmarks for Automated Commonsense Reasoning: A Survey

Arxiv

44+阅读 · 2023年2月22日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VIP会员

文章信息

相关主题

多模态推理

注释（编程）

相关VIP内容

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【ACM UMAP 2022 】可复现推荐系统的语义感知内容表示，148页ppt

【ACM UMAP 2022 】可复现推荐系统的语义感知内容表示，148页ppt

专知会员服务

17+阅读 · 2022年7月6日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【AAAI2022】多任务推荐中的跨任务知识提炼

【AAAI2022】多任务推荐中的跨任务知识提炼

专知会员服务

25+阅读 · 2022年2月22日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

【文章|自注意力(self-attention)机制图解】《Illustrated: Self-Attention》by Raimi Karim

专知会员服务

45+阅读 · 2019年11月18日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

专知

10+阅读 · 2018年6月8日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning

Arxiv

0+阅读 · 2023年5月25日

Explore More Guidance: A Task-aware Instruction Network for Sign Language Translation Enhanced with Data Augmentation

Arxiv

0+阅读 · 2023年5月25日

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

Complex Logical Reasoning over Knowledge Graphs using Large Language Models

Arxiv

0+阅读 · 2023年5月24日

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Arxiv

0+阅读 · 2023年5月24日

Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies

Arxiv

0+阅读 · 2023年5月24日

EDIS: Entity-Driven Image Search over Multimodal Web Content

Arxiv

0+阅读 · 2023年5月23日

Benchmarks for Automated Commonsense Reasoning: A Survey

Arxiv

44+阅读 · 2023年2月22日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

相关基金

基于认知心理学与虚拟现实的感官营销与跨通道联结研究

国家自然科学基金

0+阅读 · 2014年12月31日

幼儿汉语口语感知特点及神经机制

国家自然科学基金

0+阅读 · 2014年12月31日

动态面孔语音情绪的整合加工及神经生理机制

国家自然科学基金

0+阅读 · 2013年12月31日

颜色-运动特征的绑定与视觉意识的关系

国家自然科学基金

0+阅读 · 2013年12月31日

面向人类视觉感知的高分辨率遥感图像检索研究

国家自然科学基金

0+阅读 · 2012年12月31日

儿童从3D媒体中能学得更快更多吗？——三维媒体到现实世界的迁移学习机制

国家自然科学基金

0+阅读 · 2012年12月31日

低维冷原子系统中的Wilson系数和量子临界性

国家自然科学基金

0+阅读 · 2012年12月31日

情绪状态下工作记忆及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

视觉识别中类别信息早期加工的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

不同层级句法结构中语义加工的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员