VLMbenich:愿景和语言操纵的构成基准 (VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation) - 专知论文

会员服务 ·

0

Agent · Guidance · 组合性 · AIM · 机器人 ·

2022 年 8 月 17 日

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation

翻译：VLMbenich:愿景和语言操纵的构成基准

Kaizhi Zheng,Xiaotong Chen,Odest Chadwicke Jenkins,Xin Eric Wang

Benefiting from language flexibility and compositionality, humans naturally intend to use language to command an embodied agent for complex tasks such as navigation and object manipulation. In this work, we aim to fill the blank of the last mile of embodied agents -- object manipulation by following human guidance, e.g., "move the red mug next to the box while keeping it upright." To this end, we introduce an Automatic Manipulation Solver (AMSolver) system and build a Vision-and-Language Manipulation benchmark (VLMbench) based on it, containing various language instructions on categorized robotic manipulation tasks. Specifically, modular rule-based task templates are created to automatically generate robot demonstrations with language instructions, consisting of diverse object shapes and appearances, action types, and motion constraints. We also develop a keypoint-based model 6D-CLIPort to deal with multi-view observations and language input and output a sequence of 6 degrees of freedom (DoF) actions. We hope the new simulator and benchmark will facilitate future research on language-guided robotic manipulation.

翻译：人类自然打算使用语言来指挥一个包含在导航和天体操纵等复杂任务中的代理物。在这项工作中,我们的目标是填补内装物代理物最后一英里的空白 -- -- 用人的指导来操纵物体,例如,“将红色杯子移到盒子旁边,同时保持其直率。” 为此,我们引入了一个自动操纵解答器(AMSolver)系统,并以此为基础建立一个愿景和语言操纵基准(VLMbench),其中载有关于分类机器人操纵任务的各种语言指示。具体地说,基于规则的模块任务模板旨在自动生成带有语言指示的机器人演示,包括不同的对象形状和外观、动作类型和动作限制。我们还开发了一个基于关键点的模型 6D-CLIPort, 以处理多视角观测和语言输入,并输出一个6度自由动作的序列(DoF)。我们希望新的模拟器和基准将促进今后关于语言指导机器人操纵的研究。

0

相关内容

Agent

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

禾谷镰孢菌Fusarium graminearum CYP51与DMIs类杀菌剂结合的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

牛黄解毒片中甘草对雄黄致肝损伤具有保护作用的物质基础和作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

MIMO雷达抗有源干扰波形与阵列联合优化方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Kriging模型的叶盘系统多场耦合动力学多学科设计优化理论与试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

吸电子基取代二酮吡咯并吡咯有机半导体材料探究

国家自然科学基金

0+阅读 · 2012年12月31日

基于概率及证据理论的航天器不确定性多学科设计优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

机载阵列下视SAR高分辨率成像模型与处理方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

高性能4H-SiC PIN 紫外光电探测器一维阵列的研制

国家自然科学基金

0+阅读 · 2011年12月31日

自主车辆的高质量三维场景认知与导航避障控制方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Arxiv

0+阅读 · 2022年10月6日

VIMA: General Robot Manipulation with Multimodal Prompts

Arxiv

0+阅读 · 2022年10月6日

Iterative Vision-and-Language Navigation

Arxiv

0+阅读 · 2022年10月6日

Embodied Referring Expression for Manipulation Question Answering in Interactive Environment

Arxiv

1+阅读 · 2022年10月6日

Application of Stable Inversion to Flexible Manipulators Modeled by the ANCF

Arxiv

0+阅读 · 2022年10月4日

Persistent Homology Guided Monte-Carlo Tree Search for Effective Non-Prehensile Manipulation

Arxiv

0+阅读 · 2022年10月4日

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Arxiv

0+阅读 · 2022年10月3日

A Hybrid Compositional Reasoning Approach for Interactive Robot Manipulation

Arxiv

0+阅读 · 2022年10月3日

Visuo-Tactile Transformers for Manipulation

Arxiv

0+阅读 · 2022年9月30日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

VIP会员

文章信息

相关主题

相关VIP内容

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Arxiv

0+阅读 · 2022年10月6日

VIMA: General Robot Manipulation with Multimodal Prompts

Arxiv

0+阅读 · 2022年10月6日

Iterative Vision-and-Language Navigation

Arxiv

0+阅读 · 2022年10月6日

Embodied Referring Expression for Manipulation Question Answering in Interactive Environment

Arxiv

1+阅读 · 2022年10月6日

Application of Stable Inversion to Flexible Manipulators Modeled by the ANCF

Arxiv

0+阅读 · 2022年10月4日

Persistent Homology Guided Monte-Carlo Tree Search for Effective Non-Prehensile Manipulation

Arxiv

0+阅读 · 2022年10月4日

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Arxiv

0+阅读 · 2022年10月3日

A Hybrid Compositional Reasoning Approach for Interactive Robot Manipulation

Arxiv

0+阅读 · 2022年10月3日

Visuo-Tactile Transformers for Manipulation

Arxiv

0+阅读 · 2022年9月30日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

相关基金

禾谷镰孢菌Fusarium graminearum CYP51与DMIs类杀菌剂结合的分子机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

牛黄解毒片中甘草对雄黄致肝损伤具有保护作用的物质基础和作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

MIMO雷达抗有源干扰波形与阵列联合优化方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Kriging模型的叶盘系统多场耦合动力学多学科设计优化理论与试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

吸电子基取代二酮吡咯并吡咯有机半导体材料探究

国家自然科学基金

0+阅读 · 2012年12月31日

基于概率及证据理论的航天器不确定性多学科设计优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

机载阵列下视SAR高分辨率成像模型与处理方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

高性能4H-SiC PIN 紫外光电探测器一维阵列的研制

国家自然科学基金

0+阅读 · 2011年12月31日

自主车辆的高质量三维场景认知与导航避障控制方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员