动态音频-视觉导航:在未排版的 3D 环境中捕捉未听到的音频源 (Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments) - 专知论文

会员服务 ·

0

回合 · Better · 稳健性 · 3D · INFORMS ·

2022 年 1 月 12 日

Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments

翻译：动态音频-视觉导航:在未排版的 3D 环境中捕捉未听到的音频源

Abdelrahman Younes

Recent work on audio-visual navigation targets a single static sound in noise-free audio environments and struggles to generalize to unheard sounds. We introduce the novel dynamic audio-visual navigation benchmark in which an embodied AI agent must catch a moving sound source in an unmapped environment in the presence of distractors and noisy sounds. We propose an end-to-end reinforcement learning approach that relies on a multi-modal architecture that fuses the spatial audio-visual information from a binaural audio signal and spatial occupancy maps to encode the features needed to learn a robust navigation policy for our new complex task settings. We demonstrate that our approach outperforms the current state-of-the-art with better generalization to unheard sounds and better robustness to noisy scenarios on the two challenging 3D scanned real-world datasets Replica and Matterport3D, for the static and dynamic audio-visual navigation benchmarks. Our novel benchmark will be made available at http://dav-nav.cs.uni-freiburg.de.

翻译：最近关于视听导航的工作针对的是无噪音音响环境中的单一静态声音,并努力推广到未听到的声音。我们引入了新型动态视听导航基准,在这个基准中,一个体现的AI代理必须在一个没有绘图的环境中,在分流器和吵闹的声音面前捕捉一个移动的音源。我们建议采用一个端对端强化学习方法,该方法依靠一种多模式结构,将空间视听信息从双声音频信号和空间占用图中结合起来,以编码为我们新的复杂任务设置学习稳健导航政策所需的特征。我们证明,我们的方法优于当前状态的艺术,更好地概括了未听到的声音,并更加稳健地在两个具有挑战性的3D扫描实时数据集Reclic和Motalport3D上的噪音情景,用于静态和动态视听导航基准。我们的新基准将在http://dav-nav.cs.uni-freiburg.de上公布。我们的新基准将在http://dav-freiburg.de上公布。

0

相关内容

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

运动物体检测与运动相机:一个全面的综述：Moving Objects Detection with a Moving Camera: A Comprehensive Review

运动物体检测与运动相机:一个全面的综述：Moving Objects Detection with a Moving Camera: A Comprehensive Review

专知会员服务

27+阅读 · 2020年1月17日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

JNK-Annexin A7 信号转导通路对小鼠腹水型肝癌干细胞生物学功能的影响

国家自然科学基金

0+阅读 · 2015年12月31日

应用数学暑期学校（2015）

国家自然科学基金

5+阅读 · 2015年7月12日

扬子鳄环境适应的MHC多样性

国家自然科学基金

0+阅读 · 2014年12月31日

地理场景协同的多摄像机目标跟踪模型

国家自然科学基金

1+阅读 · 2013年12月31日

中国湿地分布的时空变化模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于视频的城市交通场景理解与建模方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

全国计算力学自主软件学术研讨会

国家自然科学基金

0+阅读 · 2012年9月30日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

西南地区姬蠊亚科物种多样性与区系研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于三维GIS的城市空间视觉分析研究

国家自然科学基金

2+阅读 · 2008年12月31日

Embodied Navigation at the Art Gallery

Arxiv

1+阅读 · 2022年4月19日

Detect-and-describe: Joint learning framework for detection and description of objects

Arxiv

0+阅读 · 2022年4月19日

INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL

Arxiv

0+阅读 · 2022年4月18日

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

Arxiv

0+阅读 · 2022年4月18日

Tracking monocular camera pose and deformation for SLAM inside the human body

Tracking monocular camera pose and deformation for SLAM inside the human body

Arxiv

1+阅读 · 2022年4月18日

Learning to Fill the Seam by Vision: Sub-millimeter Peg-in-hole on Unseen Shapes in Real World

Arxiv

0+阅读 · 2022年4月16日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Continual Lifelong Learning with Neural Networks: A Review

Arxiv

14+阅读 · 2019年2月11日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

运动物体检测与运动相机:一个全面的综述：Moving Objects Detection with a Moving Camera: A Comprehensive Review

运动物体检测与运动相机:一个全面的综述：Moving Objects Detection with a Moving Camera: A Comprehensive Review

专知会员服务

27+阅读 · 2020年1月17日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS 2025】稳定电影度量：面向专业视频生成的结构化分类与评测体系

战场AI决策支持系统

【博士论文】面向排序与扩散模型的安全、高效与鲁棒强化学习

面向 AI 生成图像的安全与鲁棒水印：全面综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Embodied Navigation at the Art Gallery

Arxiv

1+阅读 · 2022年4月19日

Detect-and-describe: Joint learning framework for detection and description of objects

Arxiv

0+阅读 · 2022年4月19日

INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL

Arxiv

0+阅读 · 2022年4月18日

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

Arxiv

0+阅读 · 2022年4月18日

Tracking monocular camera pose and deformation for SLAM inside the human body

Tracking monocular camera pose and deformation for SLAM inside the human body

Arxiv

1+阅读 · 2022年4月18日

Learning to Fill the Seam by Vision: Sub-millimeter Peg-in-hole on Unseen Shapes in Real World

Arxiv

0+阅读 · 2022年4月16日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Continual Lifelong Learning with Neural Networks: A Review

Arxiv

14+阅读 · 2019年2月11日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

相关基金

JNK-Annexin A7 信号转导通路对小鼠腹水型肝癌干细胞生物学功能的影响

国家自然科学基金

0+阅读 · 2015年12月31日

应用数学暑期学校（2015）

国家自然科学基金

5+阅读 · 2015年7月12日

扬子鳄环境适应的MHC多样性

国家自然科学基金

0+阅读 · 2014年12月31日

地理场景协同的多摄像机目标跟踪模型

国家自然科学基金

1+阅读 · 2013年12月31日

中国湿地分布的时空变化模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于视频的城市交通场景理解与建模方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

全国计算力学自主软件学术研讨会

国家自然科学基金

0+阅读 · 2012年9月30日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

西南地区姬蠊亚科物种多样性与区系研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于三维GIS的城市空间视觉分析研究

国家自然科学基金

2+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员