VIMA: 通用机器人操作及多式提示 (VIMA: General Robot Manipulation with Multimodal Prompts) - 专知论文

会员服务 ·

0

多峰值 · MoDELS · Prompt · 机器人 · Performer ·

2022 年 10 月 6 日

VIMA: General Robot Manipulation with Multimodal Prompts

翻译：VIMA: 通用机器人操作及多式提示

Yunfan Jiang,Agrim Gupta,Zichen Zhang,Guanzhi Wang,Yongqiang Dou,Yanjun Chen,Li Fei-Fei,Anima Anandkumar,Yuke Zhu,Linxi Fan

from arxiv, Project website: https://vimalabs.github.io/

Prompt-based learning has emerged as a successful paradigm in natural language processing, where a single general-purpose language model can be instructed to perform any task specified by input prompts. Yet task specification in robotics comes in various forms, such as imitating one-shot demonstrations, following language instructions, and reaching visual goals. They are often considered different tasks and tackled by specialized models. This work shows that we can express a wide spectrum of robot manipulation tasks with multimodal prompts, interleaving textual and visual tokens. We design a transformer-based generalist robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively. To train and evaluate VIMA, we develop a new simulation benchmark with thousands of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for imitation learning, and four levels of evaluation protocol for systematic generalization. VIMA achieves strong scalability in both model capacity and data size. It outperforms prior SOTA methods in the hardest zero-shot generalization setting by up to $2.9\times$ task success rate given the same training data. With $10\times$ less training data, VIMA still performs $2.7\times$ better than the top competing approach. We open-source all code, pretrained models, dataset, and simulation benchmark at https://vimalabs.github.io

翻译：快速学习已成为自然语言处理的成功范例,在自然语言处理中,可以指示单一通用语言模式执行输入提示规定的任何任务;然而,机器人的任务规格以各种形式出现,例如模仿一次性演示、遵循语言指示、达到视觉目标等,通常被视为不同的任务,由专门模型处理;这项工作表明,我们可以用多式提示、互换文本和视觉符号来表达广泛的机器人操纵任务;我们设计了一个基于变压器的通用机器人代理,即VIMA,可以自动处理这些提示和产出运动动作。为培训和评价VIMA,我们开发了一个新的模拟基准,以数千项程序生成的桌面任务为模式提示,600K+专家轨道用于模拟学习,以及四级评价协议用于系统化概括。VIMA在模型能力和数据大小方面都具有很强的可缩放性。我们设计了一个基于变压式通用机器人方法,在最硬化的零光化方法中比SOTA方法要优,在相同培训金额上仍达2.9美元的任务成功率。我们开发了所有最优的模型,在标准前的VIAMA之前的数据比标准要低。

0

相关内容

多峰值

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

多复变函数空间上(加权)复合算子的动力学性质

国家自然科学基金

1+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SERF原子自旋惯性与磁场测量的水下导航方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

巨磁致伸缩材料中磁机械效应和磁致伸缩

国家自然科学基金

0+阅读 · 2012年12月31日

Periostin蛋白在乳腺癌转移前微环境中的功能及作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

RI调控ILK信号通路抑制膀胱癌发生EMT及转移的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

PP2A介导的组蛋白去磷酸化对DNA损伤修复的调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Arxiv

0+阅读 · 2022年11月11日

Safety-Critical Optimal Control for Robotic Manipulators in A Cluttered Environment

Arxiv

0+阅读 · 2022年11月11日

On the role of Lip Articulation in Visual Speech Perception

Arxiv

0+阅读 · 2022年11月10日

Set based velocity shaping for robotic manipulators

Arxiv

0+阅读 · 2022年11月10日

Digital Twins for the Designs of Systems: a Perspective

Digital Twins for the Designs of Systems: a Perspective

Arxiv

1+阅读 · 2022年11月9日

Detecting Euphemisms with Literal Descriptions and Visual Imagery

Arxiv

0+阅读 · 2022年11月8日

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Arxiv

13+阅读 · 2022年3月29日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Arxiv

0+阅读 · 2022年11月11日

Safety-Critical Optimal Control for Robotic Manipulators in A Cluttered Environment

Arxiv

0+阅读 · 2022年11月11日

On the role of Lip Articulation in Visual Speech Perception

Arxiv

0+阅读 · 2022年11月10日

Set based velocity shaping for robotic manipulators

Arxiv

0+阅读 · 2022年11月10日

Digital Twins for the Designs of Systems: a Perspective

Digital Twins for the Designs of Systems: a Perspective

Arxiv

1+阅读 · 2022年11月9日

Detecting Euphemisms with Literal Descriptions and Visual Imagery

Arxiv

0+阅读 · 2022年11月8日

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Arxiv

13+阅读 · 2022年3月29日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

相关基金

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

多复变函数空间上(加权)复合算子的动力学性质

国家自然科学基金

1+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SERF原子自旋惯性与磁场测量的水下导航方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

巨磁致伸缩材料中磁机械效应和磁致伸缩

国家自然科学基金

0+阅读 · 2012年12月31日

Periostin蛋白在乳腺癌转移前微环境中的功能及作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

RI调控ILK信号通路抑制膀胱癌发生EMT及转移的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

PP2A介导的组蛋白去磷酸化对DNA损伤修复的调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员