经由视觉语言模型对经典任务规划器进行基础处理 (Grounding Classical Task Planners via Vision-Language Models) - 专知论文

会员服务 ·

0

视觉语言模型 · 任务规划 · 语言模型 · 规划系统 · 机器人 ·

2023 年 4 月 17 日

Grounding Classical Task Planners via Vision-Language Models

翻译：经由视觉语言模型对经典任务规划器进行基础处理

Xiaohan Zhang,Yan Ding,Saeid Amiri,Hao Yang,Andy Kaminski,Chad Esselink,Shiqi Zhang

Classical planning systems have shown great advances in utilizing rule-based human knowledge to compute accurate plans for service robots, but they face challenges due to the strong assumptions of perfect perception and action executions. To tackle these challenges, one solution is to connect the symbolic states and actions generated by classical planners to the robot's sensory observations, thus closing the perception-action loop. This research proposes a visually-grounded planning framework, named TPVQA, which leverages Vision-Language Models (VLMs) to detect action failures and verify action affordances towards enabling successful plan execution. Results from quantitative experiments show that TPVQA surpasses competitive baselines from previous studies in task completion rate.

翻译：经典的规划系统已经显示出利用基于规则的人类知识来为服务机器人计算准确的计划的巨大进展，但由于假定完美 perception 和 action 执行的困难而面临着挑战。为了解决这些挑战，一个解决方案是将经典规划器生成的符号状态和动作连接到机器人的感知观察结果，从而关闭 perception-action 循环。本研究提出了一个视觉上基础的规划框架，命名为 TPVQA，它利用视觉语言模型（VLMs）来检测动作执行失败并验证动作可支配性，以实现成功的计划执行。定量实验的结果表明，TPVQA 在任务完成率方面超越了之前研究的竞争基线。

0

相关内容

视觉语言模型

视觉语言模型

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

专知会员服务

64+阅读 · 2022年3月6日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【EMNLP2021】基于神经常识知识和符号逻辑规则的会话多跳推理

专知会员服务

27+阅读 · 2021年9月20日

【EMNLP2021】标签推理的细粒度实体识别

专知会员服务

25+阅读 · 2021年9月19日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

74+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

专知会员服务

39+阅读 · 2020年3月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

铁基超导体中自旋轨道耦合效应的微观理论研究

国家自然科学基金

0+阅读 · 2015年12月31日

谐振腔中低温原子气体性质的理论研究

国家自然科学基金

0+阅读 · 2014年12月31日

离子液体-低碳醇-水体系的液液相行为及热力学模拟

国家自然科学基金

0+阅读 · 2013年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

铁基超导体AFe2-ySe2(A = K,Tl,Cs,Rb)的核磁共振和缪子自旋共振研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向自然语言的虚拟地理场景重构方法

国家自然科学基金

0+阅读 · 2012年12月31日

陶瓷层式超级电容器用钕掺杂钛酸钡超细粉体的水热合成及其性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

汽车复杂约束下的多目标集成控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

淀粉Annealing调控与最大冷冻浓缩溶液Tg'关联效应的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于机器学习的惯性导航系统初始对准方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors

Arxiv

0+阅读 · 2023年6月2日

Egocentric Planning for Scalable Embodied Task Achievement

Arxiv

0+阅读 · 2023年6月2日

Grounding Language Models to Images for Multimodal Inputs and Outputs

Arxiv

0+阅读 · 2023年6月1日

Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

Arxiv

0+阅读 · 2023年6月1日

Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training

Arxiv

0+阅读 · 2023年5月31日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Fine-grained Entity Typing via Label Reasoning

Arxiv

12+阅读 · 2021年9月13日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

VIP会员

文章信息

相关主题

视觉语言模型

相关VIP内容

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

【斯坦福Kevin Chen博士论文】视觉、语言和具身AI的多模态表示， Multimodal representations for vision, language, and embodied AI

专知会员服务

64+阅读 · 2022年3月6日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【EMNLP2021】基于神经常识知识和符号逻辑规则的会话多跳推理

专知会员服务

27+阅读 · 2021年9月20日

【EMNLP2021】标签推理的细粒度实体识别

专知会员服务

25+阅读 · 2021年9月19日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

74+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

专知会员服务

39+阅读 · 2020年3月5日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors

Arxiv

0+阅读 · 2023年6月2日

Egocentric Planning for Scalable Embodied Task Achievement

Arxiv

0+阅读 · 2023年6月2日

Grounding Language Models to Images for Multimodal Inputs and Outputs

Arxiv

0+阅读 · 2023年6月1日

Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

Arxiv

0+阅读 · 2023年6月1日

Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training

Arxiv

0+阅读 · 2023年5月31日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Fine-grained Entity Typing via Label Reasoning

Arxiv

12+阅读 · 2021年9月13日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

相关基金

铁基超导体中自旋轨道耦合效应的微观理论研究

国家自然科学基金

0+阅读 · 2015年12月31日

谐振腔中低温原子气体性质的理论研究

国家自然科学基金

0+阅读 · 2014年12月31日

离子液体-低碳醇-水体系的液液相行为及热力学模拟

国家自然科学基金

0+阅读 · 2013年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

铁基超导体AFe2-ySe2(A = K,Tl,Cs,Rb)的核磁共振和缪子自旋共振研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向自然语言的虚拟地理场景重构方法

国家自然科学基金

0+阅读 · 2012年12月31日

陶瓷层式超级电容器用钕掺杂钛酸钡超细粉体的水热合成及其性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

汽车复杂约束下的多目标集成控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

淀粉Annealing调控与最大冷冻浓缩溶液Tg'关联效应的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于机器学习的惯性导航系统初始对准方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员