多模态推理和行动中的聊天GPT (MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action) - 专知论文

会员服务 ·

0

MM-REACT · 多峰值 · Vision · Prompt · ChatGPT ·

2023 年 3 月 20 日

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

翻译：多模态推理和行动中的聊天GPT

Zhengyuan Yang,Linjie Li,Jianfeng Wang,Kevin Lin,Ehsan Azarnasab,Faisal Ahmed,Zicheng Liu,Ce Liu,Michael Zeng,Lijuan Wang

We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action. In this paper, we define and explore a comprehensive list of advanced vision tasks that are intriguing to solve, but may exceed the capabilities of existing vision and vision-language models. To achieve such advanced visual intelligence, MM-REACT introduces a textual prompt design that can represent text descriptions, textualized spatial coordinates, and aligned file names for dense visual signals such as images and videos. MM-REACT's prompt design allows language models to accept, associate, and process multimodal information, thereby facilitating the synergetic combination of ChatGPT and various vision experts. Zero-shot experiments demonstrate MM-REACT's effectiveness in addressing the specified capabilities of interests and its wide application in different scenarios that require advanced visual understanding. Furthermore, we discuss and compare MM-REACT's system paradigm with an alternative approach that extends language models for multimodal scenarios through joint finetuning. Code, demo, video, and visualization are available at https://multimodal-react.github.io/

翻译：我们提出了MM-REACT，这是一种将ChatGPT与一组视觉专家集成以实现多模态推理和行动的系统范例。在本文中，我们定义并探讨了一系列先进的视觉任务，这些任务很有趣，但可能超出了现有视觉和视觉语言模型的能力范围。为了实现这种先进的视觉智能，MM-REACT引入了一种文本提示设计，该设计可以表示文本描述、文本化的空间坐标和对齐的文件名，用于密集的视觉信号，如图像和视频。MM-REACT的提示设计使语言模型能够接受、关联和处理多模态信息，从而促进了ChatGPT和各种视觉专家的协同组合。零-shot实验证明了MM-REACT在处理所需功能方面的有效性以及在需要先进视觉理解的不同场景中的广泛应用。此外，我们讨论并比较了MM-REACT的系统范例与通过联合微调扩展语言模型用于多模态场景的替代方法。代码、演示、视频和可视化可在https://multimodal-react.github.io/上找到。

0

相关内容

MM-REACT

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

专知会员服务

104+阅读 · 2023年4月28日

MM-REACT:提示ChatGPT进行多模态推理和行动

MM-REACT:提示ChatGPT进行多模态推理和行动

专知会员服务

34+阅读 · 2023年3月26日

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

专知会员服务

64+阅读 · 2022年3月6日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【斯坦福大学AAAI2020】跨越因果层次的概率推理，Probabilistic Reasoning across the Causal Hierarchy

【斯坦福大学AAAI2020】跨越因果层次的概率推理，Probabilistic Reasoning across the Causal Hierarchy

专知会员服务

46+阅读 · 2020年1月11日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

梯度纳米结构铜应变控制疲劳机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

IMC/Al功能梯度复合材料的搅拌摩擦增材制造、形成机理及力学行为

国家自然科学基金

0+阅读 · 2014年12月31日

面向知识管理的社会化媒体多源信息整合研究

国家自然科学基金

0+阅读 · 2013年12月31日

颜色-运动特征的绑定与视觉意识的关系

国家自然科学基金

0+阅读 · 2013年12月31日

超薄活性层有机太阳能电池研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向个性化推荐的地理信息可视化方法

国家自然科学基金

4+阅读 · 2012年12月31日

基于动作概念的本体知识库及在文本处理上的应用

国家自然科学基金

7+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂环境下智能轮椅的感知与控制

国家自然科学基金

3+阅读 · 2011年12月31日

金属间化合物中合金元素的占位有序化行为研究

国家自然科学基金

0+阅读 · 2009年12月31日

Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT

Arxiv

1+阅读 · 2023年5月11日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

0+阅读 · 2023年5月11日

A Glimpse in ChatGPT Capabilities and its impact for AI research

Arxiv

0+阅读 · 2023年5月10日

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Arxiv

0+阅读 · 2023年5月9日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

Arxiv

16+阅读 · 2023年2月9日

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

Arxiv

27+阅读 · 2021年1月21日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

12+阅读 · 2020年8月11日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VIP会员

文章信息

相关主题

相关VIP内容

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

专知会员服务

104+阅读 · 2023年4月28日

MM-REACT:提示ChatGPT进行多模态推理和行动

MM-REACT:提示ChatGPT进行多模态推理和行动

专知会员服务

34+阅读 · 2023年3月26日

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

专知会员服务

64+阅读 · 2022年3月6日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【斯坦福大学AAAI2020】跨越因果层次的概率推理，Probabilistic Reasoning across the Causal Hierarchy

【斯坦福大学AAAI2020】跨越因果层次的概率推理，Probabilistic Reasoning across the Causal Hierarchy

专知会员服务

46+阅读 · 2020年1月11日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

相关论文

Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT

Arxiv

1+阅读 · 2023年5月11日

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

Arxiv

0+阅读 · 2023年5月11日

A Glimpse in ChatGPT Capabilities and its impact for AI research

Arxiv

0+阅读 · 2023年5月10日

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Arxiv

0+阅读 · 2023年5月9日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

Arxiv

16+阅读 · 2023年2月9日

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

Arxiv

27+阅读 · 2021年1月21日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

12+阅读 · 2020年8月11日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

相关基金

梯度纳米结构铜应变控制疲劳机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

IMC/Al功能梯度复合材料的搅拌摩擦增材制造、形成机理及力学行为

国家自然科学基金

0+阅读 · 2014年12月31日

面向知识管理的社会化媒体多源信息整合研究

国家自然科学基金

0+阅读 · 2013年12月31日

颜色-运动特征的绑定与视觉意识的关系

国家自然科学基金

0+阅读 · 2013年12月31日

超薄活性层有机太阳能电池研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向个性化推荐的地理信息可视化方法

国家自然科学基金

4+阅读 · 2012年12月31日

基于动作概念的本体知识库及在文本处理上的应用

国家自然科学基金

7+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂环境下智能轮椅的感知与控制

国家自然科学基金

3+阅读 · 2011年12月31日

金属间化合物中合金元素的占位有序化行为研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员