点2 声音: 从单点到二进制音频, 使用 3D 点云幕 (Points2Sound: From mono to binaural audio using 3D point cloud scenes) - 专知论文

会员服务 ·

0

Mono · 点云 · INFORMS · 3D · MoDELS ·

2021 年 4 月 26 日

Points2Sound: From mono to binaural audio using 3D point cloud scenes

翻译：点2 声音: 从单点到二进制音频, 使用 3D 点云幕

Francesc Lluís,Vasileios Chatziioannou,Alex Hofmann

from arxiv, Demo: https://youtu.be/oy7DCMMC3Lk

Binaural sound that matches the visual counterpart is crucial to bring meaningful and immersive experiences to people in augmented reality (AR) and virtual reality (VR) applications. Recent works have shown the possibility to generate binaural audio from mono using 2D visual information as guidance. Using 3D visual information may allow for a more accurate representation of a virtual audio scene for VR/AR applications. This paper proposes Points2Sound, a multi-modal deep learning model which generates a binaural version from mono audio using 3D point cloud scenes. Specifically, Points2Sound consist of a vision network which extracts visual features from the point cloud scene to condition an audio network, which operates in the waveform domain, to synthesize the binaural version. Both quantitative and perceptual evaluations indicate that our proposed model is preferred over a reference case, based on a recent 2D mono-to-binaural model.

翻译：与视觉对等相匹配的边际声音对于在扩大现实和虚拟现实应用中给人们带来有意义和深入的经验至关重要。最近的工作表明有可能使用 2D 视觉信息从单体生成双声波音,作为指导。使用 3D 视觉信息可以更准确地显示VR/AR 应用程序的虚拟音频场景。本文建议了Point2Sound, 这是一种多式深层次学习模型, 利用 3D 点云层场景从单声波谱生成双声波版。具体地说, Ppoint2Sound 包括一个视觉网络, 从点云场提取视觉特征, 使在波形域运行的音频网络得以合成双声波版。定量和感知性评估都表明,基于最近的 2D 单体至模型, 我们提议的模式优于参考案例。

0

相关内容

Mono

Mono 是一个由 Novell 公司(由Ximian发起，并由Miguel de lcaza领导的，一个致力于开创.NET在Linux上使用的开源工程。

【CVPR 2021】姿态可控的语音驱动说话人脸

专知会员服务

16+阅读 · 2021年5月13日

CVPR2021-单目实时全身捕捉的方法

专知会员服务

20+阅读 · 2021年3月18日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

3D目标检测进展综述

3D目标检测进展综述

专知会员服务

193+阅读 · 2020年4月24日

【CVPR2020-Oral】自监督单目场景流量估计，Self-Supervised Monocular SFE

【CVPR2020-Oral】自监督单目场景流量估计，Self-Supervised Monocular SFE

专知会员服务

23+阅读 · 2020年4月9日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

【泡泡一分钟】高动态环境的语义单目SLAM

【泡泡一分钟】高动态环境的语义单目SLAM

泡泡机器人SLAM

5+阅读 · 2019年3月27日

CVPR2019 | 03-12日更新13篇论文汇总（包括语义分割、领域自适应、视频分析等）

CVPR2019 | 03-12日更新13篇论文汇总（包括语义分割、领域自适应、视频分析等）

极市平台

14+阅读 · 2019年3月12日

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【泡泡一分钟】基于视觉传感器的三维空间几何重建（3dv-16）

【泡泡一分钟】基于视觉传感器的三维空间几何重建（3dv-16）

泡泡机器人SLAM

4+阅读 · 2017年12月18日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects

Arxiv

0+阅读 · 2021年6月16日

SVMA: A GAN-based model for Monocular 3D Human Pose Estimation

Arxiv

0+阅读 · 2021年6月16日

Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Arxiv

0+阅读 · 2021年6月15日

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

Arxiv

0+阅读 · 2021年6月14日

Neural Descent for Visual 3D Human Pose and Shape

Arxiv

0+阅读 · 2021年6月14日

Neural Implicit 3D Shapes from Single Images with Spatial Patterns

Arxiv

1+阅读 · 2021年6月6日

Stylizing 3D Scene via Implicit Representation and HyperNetwork

Arxiv

0+阅读 · 2021年6月5日

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Arxiv

9+阅读 · 2019年3月21日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

Arxiv

7+阅读 · 2018年12月11日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2021】姿态可控的语音驱动说话人脸

专知会员服务

16+阅读 · 2021年5月13日

CVPR2021-单目实时全身捕捉的方法

专知会员服务

20+阅读 · 2021年3月18日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

3D目标检测进展综述

3D目标检测进展综述

专知会员服务

193+阅读 · 2020年4月24日

【CVPR2020-Oral】自监督单目场景流量估计，Self-Supervised Monocular SFE

【CVPR2020-Oral】自监督单目场景流量估计，Self-Supervised Monocular SFE

专知会员服务

23+阅读 · 2020年4月9日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

发射器定位中的传感器路径规划研究 | 235页

战略无人机 | 2025最新80页

蜂窝通信是否是无人机与无人地面战车主宰战场的关键？

无人机对机动战的影响 | 2025最新文献

相关资讯

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

【泡泡一分钟】高动态环境的语义单目SLAM

【泡泡一分钟】高动态环境的语义单目SLAM

泡泡机器人SLAM

5+阅读 · 2019年3月27日

CVPR2019 | 03-12日更新13篇论文汇总（包括语义分割、领域自适应、视频分析等）

CVPR2019 | 03-12日更新13篇论文汇总（包括语义分割、领域自适应、视频分析等）

极市平台

14+阅读 · 2019年3月12日

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【泡泡一分钟】基于视觉传感器的三维空间几何重建（3dv-16）

【泡泡一分钟】基于视觉传感器的三维空间几何重建（3dv-16）

泡泡机器人SLAM

4+阅读 · 2017年12月18日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

相关论文

Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects

Arxiv

0+阅读 · 2021年6月16日

SVMA: A GAN-based model for Monocular 3D Human Pose Estimation

Arxiv

0+阅读 · 2021年6月16日

Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera

Arxiv

0+阅读 · 2021年6月15日

Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

Arxiv

0+阅读 · 2021年6月14日

Neural Descent for Visual 3D Human Pose and Shape

Arxiv

0+阅读 · 2021年6月14日

Neural Implicit 3D Shapes from Single Images with Spatial Patterns

Arxiv

1+阅读 · 2021年6月6日

Stylizing 3D Scene via Implicit Representation and HyperNetwork

Arxiv

0+阅读 · 2021年6月5日

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Arxiv

9+阅读 · 2019年3月21日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

Arxiv

7+阅读 · 2018年12月11日

微信扫码咨询专知VIP会员