声音定位从运动中：联合学习声源方向和相机旋转 (Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation) - 专知论文

会员服务 ·

0

估计/估计量 · 有向 · MoDELS · Learning · state-of-the-art ·

2023 年 3 月 20 日

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

翻译：声音定位从运动中：联合学习声源方向和相机旋转

Ziyang Chen,Shengyi Qian,Andrew Owens

from arxiv, Project site: https://ificl.github.io/SLfM/

The images and sounds that we perceive undergo subtle but geometrically consistent changes as we rotate our heads. In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources. We learn to solve these tasks solely through self-supervision. A visual model predicts camera rotation from a pair of images, while an audio model predicts the direction of sound sources from binaural sounds. We train these models to generate predictions that agree with one another. At test time, the models can be deployed independently. To obtain a feature representation that is well-suited to solving this challenging problem, we also propose a method for learning an audio-visual representation through cross-view binauralization: estimating binaural sound from one view, given images and sound from another. Our model can successfully estimate accurate rotations on both real and synthetic scenes, and localize sound sources with accuracy competitive with state-of-the-art self-supervised approaches. Project site: https://ificl.github.io/SLfM/

翻译：我们所感知到的图像和声音在我们转动头部时会发生微小但几何一致的变化。在本文中，我们使用这些线索来解决一个问题，我们称之为声音定位从运动中（SLfM）：联合估计相机旋转和声源本地化。我们仅通过自监督学习来学习解决这些任务。一个视觉模型从一对图像中预测相机旋转，同时一个声音模型从双耳声音中预测声源的方向。我们训练这些模型生成相互一致的预测。在测试时，模型可以独立地部署。为了获得适合解决这个具有挑战性问题的特征表示，我们还提出了一种通过跨视角双耳化学习音频-视觉表示的方法：从一个视角估计双耳声音，给定另一个视角的图像和声音。我们的模型可以在真实和合成场景上成功地估计准确的旋转，并以与最先进的自监督方法相媲美的准确性对声源进行本地化。项目网站： https://ificl.github.io/SLfM/

0

相关内容

估计/估计量

估计/估计量

【CVPR2022】以人为中心感知的多模态预训练

【CVPR2022】以人为中心感知的多模态预训练

专知会员服务

30+阅读 · 2022年3月28日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

泡泡机器人SLAM

11+阅读 · 2018年3月31日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

融合多源图像与光流运动的旋转背景下对地运动目标检测研究

国家自然科学基金

2+阅读 · 2014年12月31日

旋转飞行物体的状态估计与轨迹预测

国家自然科学基金

0+阅读 · 2014年12月31日

球面波反射和透射系数频变机制研究及考虑频变的地层Q值估计

国家自然科学基金

0+阅读 · 2013年12月31日

基于自适应精确Cosserat弹性杆的导丝动力学交互研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于单张低精度深度图的实时精确三维曲面重建

国家自然科学基金

0+阅读 · 2012年12月31日

高质量机动目标InISAR三维成像研究

国家自然科学基金

0+阅读 · 2012年12月31日

移动机器人基于三维激光测距的室内场景认知与物体识别

国家自然科学基金

0+阅读 · 2012年12月31日

湍流噪声的声源机制和大涡模拟

国家自然科学基金

0+阅读 · 2011年12月31日

基于视皮层感知机制的生物启发运动特征层次化模型

国家自然科学基金

0+阅读 · 2011年12月31日

基于光流模型的运动分析的研究

国家自然科学基金

0+阅读 · 2009年12月31日

ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured Primitives

Arxiv

0+阅读 · 2023年5月9日

RLocator: Reinforcement Learning for Bug Localization

Arxiv

0+阅读 · 2023年5月9日

Human or Machine: Reflections on Turing-Inspired Testing for the Everyday

Arxiv

0+阅读 · 2023年5月7日

"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

Arxiv

0+阅读 · 2023年5月5日

Measuring Self-Supervised Representation Quality for Downstream Classification using Discriminative Features

Arxiv

0+阅读 · 2023年5月4日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

Commonsense Knowledge Base Completion with Structural and Semantic Context

Commonsense Knowledge Base Completion with Structural and Semantic Context

Arxiv

20+阅读 · 2019年12月19日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Improving Multiple Object Tracking with Optical Flow and Edge Preprocessing

Arxiv

10+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

估计/估计量

state-of-the-art

相关VIP内容

【CVPR2022】以人为中心感知的多模态预训练

【CVPR2022】以人为中心感知的多模态预训练

专知会员服务

30+阅读 · 2022年3月28日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

泡泡机器人SLAM

11+阅读 · 2018年3月31日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

相关论文

ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured Primitives

Arxiv

0+阅读 · 2023年5月9日

RLocator: Reinforcement Learning for Bug Localization

Arxiv

0+阅读 · 2023年5月9日

Human or Machine: Reflections on Turing-Inspired Testing for the Everyday

Arxiv

0+阅读 · 2023年5月7日

"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

Arxiv

0+阅读 · 2023年5月5日

Measuring Self-Supervised Representation Quality for Downstream Classification using Discriminative Features

Arxiv

0+阅读 · 2023年5月4日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

Commonsense Knowledge Base Completion with Structural and Semantic Context

Commonsense Knowledge Base Completion with Structural and Semantic Context

Arxiv

20+阅读 · 2019年12月19日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Improving Multiple Object Tracking with Optical Flow and Edge Preprocessing

Arxiv

10+阅读 · 2018年1月29日

相关基金

融合多源图像与光流运动的旋转背景下对地运动目标检测研究

国家自然科学基金

2+阅读 · 2014年12月31日

旋转飞行物体的状态估计与轨迹预测

国家自然科学基金

0+阅读 · 2014年12月31日

球面波反射和透射系数频变机制研究及考虑频变的地层Q值估计

国家自然科学基金

0+阅读 · 2013年12月31日

基于自适应精确Cosserat弹性杆的导丝动力学交互研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于单张低精度深度图的实时精确三维曲面重建

国家自然科学基金

0+阅读 · 2012年12月31日

高质量机动目标InISAR三维成像研究

国家自然科学基金

0+阅读 · 2012年12月31日

移动机器人基于三维激光测距的室内场景认知与物体识别

国家自然科学基金

0+阅读 · 2012年12月31日

湍流噪声的声源机制和大涡模拟

国家自然科学基金

0+阅读 · 2011年12月31日

基于视皮层感知机制的生物启发运动特征层次化模型

国家自然科学基金

0+阅读 · 2011年12月31日

基于光流模型的运动分析的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员