翻译后的标题： (Grounding Object Relations in Language-Conditioned Robotic Manipulation with Semantic-Spatial Reasoning) - 专知论文

会员服务 ·

0

操作 · 识别 · 语义空间 · 语义概念 · 机器人 ·

2023 年 3 月 31 日

Grounding Object Relations in Language-Conditioned Robotic Manipulation with Semantic-Spatial Reasoning

翻译：翻译后的标题：

Qian Luo,Yunfei Li,Yi Wu

from arxiv, AAAI 2023 RL Ready for Production Workshop

Grounded understanding of natural language in physical scenes can greatly benefit robots that follow human instructions. In object manipulation scenarios, existing end-to-end models are proficient at understanding semantic concepts, but typically cannot handle complex instructions involving spatial relations among multiple objects. which require both reasoning object-level spatial relations and learning precise pixel-level manipulation affordances. We take an initial step to this challenge with a decoupled two-stage solution. In the first stage, we propose an object-centric semantic-spatial reasoner to select which objects are relevant for the language instructed task. The segmentation of selected objects are then fused as additional input to the affordance learning stage. Simply incorporating the inductive bias of relevant objects to a vision-language affordance learning agent can effectively boost its performance in a custom testbed designed for object manipulation with spatial-related language instructions.

翻译：基于语境的语言条件机器人操纵中物体关系的基础设施建设，具有语义空间推理翻译后的摘要：在物体操作场景中，天然语言的基础设施建设在物理场景中得到了良好的应用，其对遵循人类指令的机器人有很大的益处。在物体操作场景中，现有的端到端模型擅长理解语义概念，但往往无法处理涉及多个物体的空间关系的复杂指令。可从物体层面的空间关系的推理中学习精确的像素级操作识别能力。我们提出了一个分步骤的解决方案，首先，我们提出了一个基于物体中心的语义-空间推理器，以选择哪些对于符合语言指示的任务是相关的物体。选定的物品的分割也作为额外的输入融合到操作识别学习阶段。只需将相关物体的归纳偏向加入视觉-语言操作识别学习代理，即可在用于空间相关语言指令的自定义测试平台上有效提升其性能。

0

相关内容

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

专知会员服务

12+阅读 · 2022年3月9日

【EMNLP2021】标签推理的细粒度实体识别

专知会员服务

25+阅读 · 2021年9月19日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【CVPR 2020-人大】基于层次图推理的细粒度文本视频跨模态检索

【CVPR 2020-人大】基于层次图推理的细粒度文本视频跨模态检索

专知会员服务

67+阅读 · 2020年4月5日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

CVPR 2020 | 细粒度文本视频跨模态检索

CVPR 2020 | 细粒度文本视频跨模态检索

AI科技评论

17+阅读 · 2020年3月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

泡泡机器人SLAM

22+阅读 · 2018年12月4日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

【泡泡一分钟】端到端的弱监督语义对齐

【泡泡一分钟】端到端的弱监督语义对齐

泡泡机器人SLAM

53+阅读 · 2018年4月5日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

16篇论文入门manipulation研究

16篇论文入门manipulation研究

机器人学家

15+阅读 · 2017年6月6日

基于矩阵变换器的瞬变电磁发射系统关键技术的研究

国家自然科学基金

0+阅读 · 2015年12月31日

能量双向馈动的电动汽车无线充放电系统关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

过渡金属硫簇化合物的结构设计及其对电化学析氢反应催化性能的研究

国家自然科学基金

0+阅读 · 2014年12月31日

UUV近海底自主作业的混合视觉伺服协调控制方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于云的多语种软件仿真平台关键技术的研究

国家自然科学基金

0+阅读 · 2012年12月31日

在"鸡尾酒会"环境中促进语音识别的听知觉加工链

国家自然科学基金

0+阅读 · 2011年12月31日

具有自我感知特性的主动视觉系统在复杂未知物体三维重建中的研究

国家自然科学基金

0+阅读 · 2011年12月31日

物联网RFID量化路由研究

国家自然科学基金

1+阅读 · 2011年12月31日

Lorenz-like系统族的等价性和混沌吸引子几何结构

国家自然科学基金

0+阅读 · 2011年12月31日

VideoLLM: Modeling Video Sequence with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Instance-Level Semantic Maps for Vision Language Navigation

Arxiv

0+阅读 · 2023年5月21日

Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner

Arxiv

0+阅读 · 2023年5月19日

Semantic Anomaly Detection with Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation

Arxiv

0+阅读 · 2023年5月18日

Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

Arxiv

0+阅读 · 2023年5月18日

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal

Arxiv

21+阅读 · 2022年12月12日

Image Manipulation Detection by Multi-View Multi-Scale Supervision

Arxiv

13+阅读 · 2021年7月25日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

专知会员服务

12+阅读 · 2022年3月9日

【EMNLP2021】标签推理的细粒度实体识别

专知会员服务

25+阅读 · 2021年9月19日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【CVPR 2020-人大】基于层次图推理的细粒度文本视频跨模态检索

【CVPR 2020-人大】基于层次图推理的细粒度文本视频跨模态检索

专知会员服务

67+阅读 · 2020年4月5日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

CVPR 2020 | 细粒度文本视频跨模态检索

CVPR 2020 | 细粒度文本视频跨模态检索

AI科技评论

17+阅读 · 2020年3月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

泡泡机器人SLAM

22+阅读 · 2018年12月4日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

【泡泡一分钟】端到端的弱监督语义对齐

【泡泡一分钟】端到端的弱监督语义对齐

泡泡机器人SLAM

53+阅读 · 2018年4月5日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

16篇论文入门manipulation研究

16篇论文入门manipulation研究

机器人学家

15+阅读 · 2017年6月6日

相关论文

VideoLLM: Modeling Video Sequence with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Instance-Level Semantic Maps for Vision Language Navigation

Arxiv

0+阅读 · 2023年5月21日

Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner

Arxiv

0+阅读 · 2023年5月19日

Semantic Anomaly Detection with Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation

Arxiv

0+阅读 · 2023年5月18日

Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

Arxiv

0+阅读 · 2023年5月18日

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal

Arxiv

21+阅读 · 2022年12月12日

Image Manipulation Detection by Multi-View Multi-Scale Supervision

Arxiv

13+阅读 · 2021年7月25日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

相关基金

基于矩阵变换器的瞬变电磁发射系统关键技术的研究

国家自然科学基金

0+阅读 · 2015年12月31日

能量双向馈动的电动汽车无线充放电系统关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

过渡金属硫簇化合物的结构设计及其对电化学析氢反应催化性能的研究

国家自然科学基金

0+阅读 · 2014年12月31日

UUV近海底自主作业的混合视觉伺服协调控制方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于云的多语种软件仿真平台关键技术的研究

国家自然科学基金

0+阅读 · 2012年12月31日

在"鸡尾酒会"环境中促进语音识别的听知觉加工链

国家自然科学基金

0+阅读 · 2011年12月31日

具有自我感知特性的主动视觉系统在复杂未知物体三维重建中的研究

国家自然科学基金

0+阅读 · 2011年12月31日

物联网RFID量化路由研究

国家自然科学基金

1+阅读 · 2011年12月31日

Lorenz-like系统族的等价性和混沌吸引子几何结构

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员