用于室内环境内长期的构成任务的模块化愿景语言导航和操纵框架 (A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment) - 专知论文

会员服务 ·

0

Vision · 回合 · 易处理的 · INTERACT · 端到端 ·

2021 年 1 月 19 日

A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment

翻译：用于室内环境内长期的构成任务的模块化愿景语言导航和操纵框架

Homagni Saha,Fateme Fotouhif,Qisai Liu,Soumik Sarkar

from arxiv, 16 pages

In this paper we propose a new framework - MoViLan (Modular Vision and Language) for execution of visually grounded natural language instructions for day to day indoor household tasks. While several data-driven, end-to-end learning frameworks have been proposed for targeted navigation tasks based on the vision and language modalities, performance on recent benchmark data sets revealed the gap in developing comprehensive techniques for long horizon, compositional tasks (involving manipulation and navigation) with diverse object categories, realistic instructions and visual scenarios with non-reversible state changes. We propose a modular approach to deal with the combined navigation and object interaction problem without the need for strictly aligned vision and language training data (e.g., in the form of expert demonstrated trajectories). Such an approach is a significant departure from the traditional end-to-end techniques in this space and allows for a more tractable training process with separate vision and language data sets. Specifically, we propose a novel geometry-aware mapping technique for cluttered indoor environments, and a language understanding model generalized for household instruction following. We demonstrate a significant increase in success rates for long-horizon, compositional tasks over the baseline on the recently released benchmark data set-ALFRED.

翻译：在本文件中,我们提出了一个新的框架 -- -- MoViLan(Modular ViLan)(Modular Vision and langues),用于执行日常室内家务的视觉天然语言指令。虽然已经根据愿景和语言模式为有针对性的导航任务提出了若干数据驱动的、端到端学习框架,但最近基准数据集的绩效显示,在开发综合长视线技术、包含不同对象类别的构成任务(涉及操纵和导航)、现实指令和视觉情景以及不可逆状态变化等方面,存在着差距。我们提出了一种模块化方法,用以处理综合导航和对象互动问题,而不需要严格一致的愿景和语言培训数据(例如,以专家展示的轨迹为形式)。这种方法大大偏离了这一空间传统的端到端技术,并使得能够利用不同的愿景和语言数据集开展更可移动的培训进程。具体地说,我们提出了用于封闭的室内环境的新型几何测量测量绘图技术,以及随后普及的家庭教学语言理解模型。我们展示了长正弦的成功率大幅提高,比最近发布的基准数据基准A。

0

相关内容

Vision

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【DeepMind-牛津-CMU-CVPR2020】无监督文字翻译视频中的视觉基础，Visual Grounding in Video for Unsupervised Word Translation

【DeepMind-牛津-CMU-CVPR2020】无监督文字翻译视频中的视觉基础，Visual Grounding in Video for Unsupervised Word Translation

专知会员服务

13+阅读 · 2020年3月12日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

专知会员服务

33+阅读 · 2019年11月28日

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

专知会员服务

10+阅读 · 2019年10月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

跟踪SLAM前沿动态系列之ICCV2019

跟踪SLAM前沿动态系列之ICCV2019

泡泡机器人SLAM

7+阅读 · 2019年11月23日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE

3+阅读 · 2019年4月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Autonomous Drone Racing with Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年3月15日

B-GAP: Behavior-Guided Action Prediction and Navigation for Autonomous Driving

Arxiv

0+阅读 · 2021年3月15日

A Collaborative Visual SLAM Framework for Service Robots

Arxiv

0+阅读 · 2021年3月14日

Navigation Framework for a Hybrid Steel Bridge Inspection Robot

Arxiv

0+阅读 · 2021年3月11日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Joint Monocular 3D Vehicle Detection and Tracking

Joint Monocular 3D Vehicle Detection and Tracking

Arxiv

8+阅读 · 2018年12月2日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Arxiv

9+阅读 · 2018年11月25日

Visual Semantic Navigation using Scene Priors

Arxiv

5+阅读 · 2018年10月15日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

5+阅读 · 2018年4月5日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【DeepMind-牛津-CMU-CVPR2020】无监督文字翻译视频中的视觉基础，Visual Grounding in Video for Unsupervised Word Translation

【DeepMind-牛津-CMU-CVPR2020】无监督文字翻译视频中的视觉基础，Visual Grounding in Video for Unsupervised Word Translation

专知会员服务

13+阅读 · 2020年3月12日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

【CVPR 2019 | tutorial】自主汽车的感知、预测和大规模数据采集：Perception, Prediction, and Large Scale Data Collection for Autonomous Cars

专知会员服务

33+阅读 · 2019年11月28日

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

Understanding Color and the In-Camera Image Processing Pipeline for Computer Vision 【Michael S. Brown IEEE】韩国 ICCV 2019

专知会员服务

10+阅读 · 2019年10月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

跟踪SLAM前沿动态系列之ICCV2019

跟踪SLAM前沿动态系列之ICCV2019

泡泡机器人SLAM

7+阅读 · 2019年11月23日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE

3+阅读 · 2019年4月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Autonomous Drone Racing with Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年3月15日

B-GAP: Behavior-Guided Action Prediction and Navigation for Autonomous Driving

Arxiv

0+阅读 · 2021年3月15日

A Collaborative Visual SLAM Framework for Service Robots

Arxiv

0+阅读 · 2021年3月14日

Navigation Framework for a Hybrid Steel Bridge Inspection Robot

Arxiv

0+阅读 · 2021年3月11日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Joint Monocular 3D Vehicle Detection and Tracking

Joint Monocular 3D Vehicle Detection and Tracking

Arxiv

8+阅读 · 2018年12月2日

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

Arxiv

9+阅读 · 2018年11月25日

Visual Semantic Navigation using Scene Priors

Arxiv

5+阅读 · 2018年10月15日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

5+阅读 · 2018年4月5日

微信扫码咨询专知VIP会员