像素声音 (The Sound of Pixels) - 专知论文

会员服务 ·

0

PixelPlayer · 学成 · Parse · MoDELS · 未标记 ·

2018 年 7 月 5 日

The Sound of Pixels

翻译：像素声音

Hang Zhao,Chuang Gan,Andrew Rouditchenko,Carl Vondrick,Josh McDermott,Antonio Torralba

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel. Our approach capitalizes on the natural synchronization of the visual and audio modalities to learn models that jointly parse sounds and images, without requiring additional manual supervision. Experimental results on a newly collected MUSIC dataset show that our proposed Mix-and-Separate framework outperforms several baselines on source separation. Qualitative results suggest our model learns to ground sounds in vision, enabling applications such as independently adjusting the volume of sound sources.

翻译：我们引入了像素Player, 该系统通过利用大量未贴标签的视频, 学会定位生成声音的图像区域, 并将输入的声音分离成代表每个像素声音的一组组件。我们的方法利用视觉和音频模式的自然同步性来学习联合分析声音和图像的模型, 而不需要额外的人工监督。新收集的 Mix- Sparate 数据集的实验结果显示, 我们提议的 Mix- sparate 框架在源分离上优于几个基线。定性结果显示, 我们的模型学会将声音定位在视觉中, 使应用能够独立调整声音源的数量等。

0

相关内容

PixelPlayer

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

【泡泡一分钟】三维卷积神经网络实现实时非模态三维目标检测

【泡泡一分钟】三维卷积神经网络实现实时非模态三维目标检测

泡泡机器人SLAM

12+阅读 · 2019年5月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Multimodal Categorization of Crisis Events in Social Media

Multimodal Categorization of Crisis Events in Social Media

Arxiv

20+阅读 · 2020年4月10日

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Arxiv

3+阅读 · 2020年3月14日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Arxiv

8+阅读 · 2019年2月11日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

Binary Constrained Deep Hashing Network for Image Retrieval without Manual Annotation

Binary Constrained Deep Hashing Network for Image Retrieval without Manual Annotation

Arxiv

3+阅读 · 2018年8月2日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Arxiv

4+阅读 · 2018年3月18日

VIP会员

文章信息

相关主题

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

【泡泡一分钟】三维卷积神经网络实现实时非模态三维目标检测

【泡泡一分钟】三维卷积神经网络实现实时非模态三维目标检测

泡泡机器人SLAM

12+阅读 · 2019年5月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

Multimodal Categorization of Crisis Events in Social Media

Multimodal Categorization of Crisis Events in Social Media

Arxiv

20+阅读 · 2020年4月10日

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Arxiv

3+阅读 · 2020年3月14日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Arxiv

8+阅读 · 2019年2月11日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

Binary Constrained Deep Hashing Network for Image Retrieval without Manual Annotation

Binary Constrained Deep Hashing Network for Image Retrieval without Manual Annotation

Arxiv

3+阅读 · 2018年8月2日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Arxiv

4+阅读 · 2018年3月18日

微信扫码咨询专知VIP会员